- TimeOut
-
The time (in seconds) to wait for the remote server to return a response.
- HttpProxy
-
The address of a proxy server to pass all HTTP requests through
- FtpProxy
-
The address of a proxy server to pass all FTP requests through
- FileTypes
-
Web servers generally return unknown files with a default type. Filetypes
makes it possible to further subdivide files which are sent with this
default type, according to their extension.
FileTypes takes a list of extension mime-type pairs. For
example
rpm application/x-rpm
html text/html
- RunMode
-
Set the process model that the server is to use to do the gathering. This
dictates whether gathering takes place sequentially (ie pages are gathered
one at a time) or in parallel (multiple pages are gathered at once)
RunMode can be one of :
- ClientServer
- ClientServer provides a single machine, multi-process reaper.
- SingleThread
- SingleThread provides a single machine, single threaded reaper
- Nntpserver
-
The server to be used by default to gather news articles from
- TemporaryDir
-
A directory which can be used to store temporary files created during the
reaping process
- Delay
-
The length of time to wait between making a request from a server, and
making a subsequent request. This is measured in seconds.
If you're gathering from your own servers, what you set this to is your
business (but be careful not to overload them). For gathering from servers
that you don't maintain, the recommended minimum delay is between 1 and 5
minutes. In general, the further away (administratively) the server is,
the more cautious you should be.
- NoIms
-
If set to any value, this will disable If-Modfied-Since gathering.
Under normal operation, when the reaper has a resource description of an
object in the database, it will only update that description if the server
indicates that the modification time of the object is newer than the copy in
the database.
This mode of operation is desirable in a production gatherer, however whilst
prototyping, it is possible for changes in the code or configuration to be
missed if only updating modified documents, so this option is available to
force the gatherer to fetch and update every document.
- Debug
-
A list of debugging flags, as listed in the programmers documentation for
Harvest::Debug. (If you need to turn this on, you probably want to look at
this documenation anyway)
- DBType
-
The type of DBM file to use for our internal data structures. This can be
any *_File DBM structure supported by your Perl - common options are
DB_File, GDBM_File and NDBM_File.
|