.

Harvest::Reaper::Fetcher - fetch objects

DESCRIPTION

The Fetcher class provides the Harvest interface to the actual routines that perform the document fetching.

Currently this class is really just a wrapper around the LWP routines, using the Harvest::Reaper::HarvestUA class as a UserAgent. However, it could be extended to provide further fetch methods, not implemented within LWP.

METHODS

$fetcher=new Harvest::Reaper::Fetcher ($config);

Create a new instance of the Fetcher class. There should be only one instance of this class per Reaper process.

$fetcher->fetch($obj,$file,$lmt)

Fetches the object defined in $obj, to the filename given by $file.

This routine will fetch the URL encapsulated in the Harvest::Object $obj and dump any data returned into the file given in $file. Any headers returned with the data are copied in to the $obj->headers structure, and the 'Content-Base' attribute of this structure is updated with the Base URL of the object.

The 'Harvest-Status' attribute of the header data is set to a value to indicate the success, or failure of the fetch. Currently the status can be one of the following:

OK
Indicates that the object was fetched fine, and that the data returned should be summarised and added to the database
REDIR
The object has moved, and a new URL should be substituted for this one. The new URL is as contained in the 'Location' attribute of the header data.
ERROR
Indicates that the request resulted in a server error indicating a problem with the document or URL.
NOTMOD
The document has not been modified since it was last fetched, and the metadata resulting from the last fetch should be used instead
DOWN (not yet implemented)
The server containing the document is down, or not responding. The fetch attempt should be abandoned for now, but rescheduled for later

ACKNOWLEDGEMENTS

This class is currently implemented as a wrapper to the libwww-perl client routines, to whom it owes most of its functionality