Downloading and Installing Harvest-NG
Harvest-NG is not currently available from CPAN, due to the odd way in which it is configured. We hope to make later releases available from CPAN, once the package file structure has been better configured, and some namespace issues resolved.
Harvest-NG requires a fairly up to date perl, and a number of additional modules. If you want to grab the code anyway, feel free to head onwards to the next section.
Harvest-NG has only been tested with versions of perl later than 5.004. To check which version of perl you have installed, run the perl -v command.
In addition, we require the following extra modules, all of which are available from CPAN. This list is correct at the time of writing, but may change as CPAN modules are reorganised. Please let us know of any ommissions.
On top of that list, a number of optional portions of the code require other modules:
The latest version of Harvest-NG is available from Sourceforge. Older versions are also available in this directory.
Harvest-NG is designed to be run from the directory in which it is installed in. You need only read the next paragraph if you wish to vary this (to install it system-wide, for instance).
To seperate the Harvest tree out, copy the Harvest directory to
where you want perl libraries, and the reap file, together with
any utilites you wish to use into a suitable binary directory. Then, if
the Harvest directory is not in your perl search path, add
the following on the second line of reap, and of any utilites you are
Finally, if your perl is not installed in /usr/bin you'll need to
alter the "shebang" (#!) lines of reap, and any of the utlities. In the
first line of these files, you'll see
|Doing your first run|
Just to check everything's working, try the following, using one of the configuration files shipped with the package in the config directory
reap --config config/firstrun.conf http://localhost/(if you're not running a web server on the local machine, replace localhost with another, locally run, webserver).
If all is OK, reap will proceed to fetch in the first 10 URLs that it encounters on this server, and store and summarise as many of them as it can. It will create the storage files in your current directory
For what to do next, read the more comprehensive user documentation