Harvest-NG User documentation

Using the reap spider
The `reap` spider is at the heart of the Harvest-NG system. It will crawl the web, gathering web pages and following links according to its configuration and store them in its database for further use. Reap is a highly configurable beast, and can get relatively complicated, however - it has been designed from the ground up with sensible defaults, so you don't have to set everything unless you need to. Reap can be configured either from the command line, or by using the Harvest-NG configuration file format. This document assumes that you have already downloaded, installed and tested Harvest-NG. If you haven't, please see the earlier instructions for details of this process.
Contents
	Command line usage How to use reap simply from the command line How reap organises the gathering process Details of how reap organises the gathering process - some understanding of this is necessary in order to be able to effectively configure, control and use it. Configuration file usage How to use configuration files to control the gatherer Configuration directives A complete, autogenerated listing of all of the available configuration directives.
Please mail any comments or questions to sxw@users.sourceforge.net

Using the reap spider