Using the reap spider | ||
The reap spider is at the heart of the Harvest-NG system. It will crawl the web, gathering web pages and following links according to its configuration and store them in its database for further use. Reap is a highly configurable beast, and can get relatively complicated, however - it has been designed from the ground up with sensible defaults, so you don't have to set everything unless you need to. Reap can be configured either from the command line, or by using the Harvest-NG configuration file format. This document assumes that you have already downloaded, installed and tested Harvest-NG. If you haven't, please see the earlier instructions for details of this process.
| ||
Contents | ||
|
||