.

Harvest::Reaper::Summarise - maintain the summarising system

DESCRIPTION

Summarisers are chosen according to the MIME type of the incoming object. Harvest::Reaper::Summarise maintains a list of summarisers for these types, and selects the correct one to use for a given object.

METHODS

$sum=new Harvest::Reaper::Summarise($mimecfg,$filecfg,$enccfg);

Construct a new instance of the summariser class. $mimecfg, $filecfg and $enccfg are all instances of Harvest::Config, representing mime type information, file type information, and content encoding information respectively.

$mimecfg should take the following format mimetype summariser, summariser options

$filecfg should take the following form fileextension mimetype

$enccfg should take the form encoding module

See the add methods below for a further discussion of this.

$summarisers->addenc($enc,$decoder);

Add a decoder for the content encoding technique represented by enc.

$summarisers->add($type,$summariser,@args);

Add a summariser for the MIME content type $type. The content type can be either a full MIME type (ie 'text/html') or just 'text' or '*'. A special value of ALWAYS indicates a summariser which will always be run, no matter what. Summarisers for specific protocols can be indicated by a summariser name of the protocol in capitals (ie HTTP, NNTP, etc ...). These protocol summarisers will be run before the data summarisers, and are intended primarily for handling header information that should be included in the summarised representation.

Only the ALWAYS type can have more than one summariser associated with it.

$summarisers->addext($ext,$type)

Add a mapping between a file extension (ie txt) and a MIME type (ie text/plain).

These mappings are used to select summarisers in cases where the MIME type returned by the server is not recognized, or no MIME type is returned.

$summarisers->summarise($obj,$file)

Summarise a given object.

If a MIME content type exists for the object then this is used to select the appropriate summariser to use, otherwise the portions using content types are skipped.

Firstly a summariser for the full content type is looked for.

If a full type summariser cannot be found, then a summariser for the extension (ie the portion of the filename after the final '.') is looked for.

If this also cannot be found, then the summariser for the major type will be run. If this also does not exist, then the generic ('*') summariser will be run.

Next, a protocol summariser, specific to the transport method over which the object was fetched will be run.

Finally any 'ALWAYS' summarisers are run on the object

Summarisers may return the name of another summariser that will be invoked on their output.