Harvest-NG Utilities

In addition to reap, Harvest-NG contains a number of useful utilities, mainly for manipulating the database which reap creates. Many of these utilites are designed purely as a "proof of concept", and may require tailoring to your site, others run 'as shipped'.

gatherd

The gatherd provides a Harvest compatible gatherd to provide information to a broker. It can be run in either standalone mode, or invoked from inetd. It takes the following command line options:

--db
The name of the database to serve (this should be the same as the database filename used by reap). If you'd don't specify this it defaults to WORKING in the current directory
--port
The port to use. Harvest generally uses 8500 or there abouts. If this is specified, the program will run in standalone mode, if not, it will expect to be run from inetd.

dump

Dump dumps the contents of the specified database to stdout. It takes one argument, the name of the database file.

graph

This tool generates site linkage graphs for use with the daVinci program - a graph plotting package which is free for non-commercial use. It could easily be expanded to generate other formats.

Having performed a reap of the site or sites you wish to graph, you can simply run the graph utility over this database.

Simply run the following:

utils/graph --dbfile database root-node > graph-file
to generate a daVinci graph of the current database, starting at root-node in graph-file.

Loading graph-file into daVinci will allow you to manipulate the display of the graph - if you want to edit it, then try the graph editor that is included within the daVinci package.

If you find that the graph is far to complex to understand, or even for daVinci to process then you can simplify it using a number of options.

--hidesub
Specifying this option will hide all subgraphs, making it easy to selectively display areas of the tree, without making daVinci display the entire tree initially.
--noback
This option disables the display of backwards links, again making the graph easier to handle.

Linkcheck

Linkcheck parses the reap error output and produces an HTML list of broken links. To use it, save the error output from reap to a file, and then pipe this file into linkcheck - link check will then produce an HTML page fragment containing a list of these errors.

For example:

  ./reap --config mine.conf 2>/tmp/error.log
  cat /tmp/error.log | util/linkcheck > errors.html

whatsnew

Whatsnew produces a list of all of the pages, with titles, that have been updated or modified within a given period of time. This page is produced as an HTML fragment (that is, a page without the surrounding HEAD and BODY tags). Whatsnew takes as arguments the name of the database, and the time period in days to list up to. For example, the following will list all of the files modified in the last 7 days in the WORKING database.

  util/whatsnew WORKING 7