Database overview

The Database is where Harvest-NG stores all of its information about the work it has done - it stores the content summaries for use by other programs, and it stores management information which is used to drive the gathering process.

The Database section of the configuration is also slightly different in that it is written to a file within the database itself - this is so that other programs can fetch information from the database without needing to know all of the configuration details.

Database information is encoded within the Database section of the file, whose autogenerated information should be consulted for more details as to the available configuration options.

Disk based databases

At present only disk based databases are supported. There are two different types of database, the DBM database is included mainly for backwards compatibility with old Harvest installs, in which case, it must be used with its Type directive set to GDBM_File

The other type of disk database DirStructure stores all of its objects in readable form in a directory structure. This can then be used with programs such as glimpse, zebra, or any other indexing program which can read from disk to generate a simple searchable index of the data.

The DirStructure database scales far better than the DBM one, and is the recommended default.