Downloading and Installing Harvest-NG | ||
Harvest-NG is not currently available from CPAN, due to the odd way in which it is configured. We hope to make later releases available from CPAN, once the package file structure has been better configured, and some namespace issues resolved.
| ||
Preinstallation Issues | ||
Harvest-NG requires a fairly up to date perl, and a number of additional modules. If you want to grab the code anyway, feel free to head onwards to the next section. Harvest-NG has only been tested with versions of perl later than 5.004. To check which version of perl you have installed, run the perl -v command. In addition, we require the following extra modules, all of which are available from CPAN. This list is correct at the time of writing, but may change as CPAN modules are reorganised. Please let us know of any ommissions. Note that HTML-Parser version 3 is considerably faster than earlier versions.On top of that list, a number of optional portions of the code require other modules:
| ||
Downloading Harvest-NG | ||
The latest version of Harvest-NG is available from Sourceforge. Older versions are also available in this directory.
| ||
Installing Harvest-NG | ||
Harvest-NG is designed to be run from the directory in which it is installed in. You need only read the next paragraph if you wish to vary this (to install it system-wide, for instance).
To seperate the Harvest tree out, copy the Harvest directory to
where you want perl libraries, and the reap file, together with
any utilites you wish to use into a suitable binary directory. Then, if
the Harvest directory is not in your perl search path, add
the following on the second line of reap, and of any utilites you are
using
Finally, if your perl is not installed in /usr/bin you'll need to
alter the "shebang" (#!) lines of reap, and any of the utlities. In the
first line of these files, you'll see
| ||
Doing your first run | ||
Just to check everything's working, try the following, using one of the configuration files shipped with the package in the config directory reap --config config/firstrun.conf http://localhost/(if you're not running a web server on the local machine, replace localhost with another, locally run, webserver). If all is OK, reap will proceed to fetch in the first 10 URLs that it encounters on this server, and store and summarise as many of them as it can. It will create the storage files in your current directory For what to do next, read the more comprehensive user documentation
|
||