.

Harvest::Controller::FilterManager - maintain the URL filtering system

DESCRIPTION

FilterManager is an abstract class for maintaining a set of filters which can alter the data contained in Harvest::Objects passed through them. A filter may either alter the object passed to it, or remove the object completely.

It should be subclassed by a class which overrides the 'path' method, providing the path to the packages that that class uses as filters.

Subclasses of this class do no filtering themselves, just maintain a list of other filter classes through which every URL is passed.

For general information on the methods that a filter class should provide see Harvest::Controller::GenericFilter

METHODS

$filters->path

This method should be provided by subclasses and return the path of the filters to be used by that subclass (for example Harvest::Controller::URLFilter)

$filters=new Filter $config $filterconfig

Start a new filter chain. There can be more than one filter chain active in any given gatherer, for instance to deal with cases of multiple rootnodes, each with different filter options. Each active filter chain should have a unique name, which is used by the filter modules to allocate temporary files.

This method should not be called directly, as this is an abstract class, which should be subclassed

$config is a Harvest::Config object which should contain the main configuration information for the reaper (check individual filters for their requirements, but generally this should contain at least the TemporaryDir and Name tags)

$filteconfig is another Harvest::Config object which should contain the set of filters to be used within the chain.

$filters->add($filter,$args);

Add a new filter at the end of the chain, with the options given in the string $args.

@urllist=$filters->check($object)

Check a given url.

Returns a list of URLs to add to the workload. (As checking a URL may, in some cases, cause more than one URL to be fetched, a list is returned rather than just a single item)

@urllist=$filters->result($object);

Pass the result of fetching a filter requested object (an object with a rootnode of FILTER) into the filter chain.

Returns a list of URLs which should be filtered, then fetched.