.

Harvest::Controller::URLFilter::URLregex - limit based on regular expressions

DESCRIPTION

This allows the URLs to be indexed to be specified by a file giving a list of regular expressions to deny or allow. Regular expression matching is done using Perl regular expressions directly, and is performed upon the entire location portion of the URL, excluding any query parameters that may be passed in.

The action of the first match for the URL is taken (so list your rules in terms of the most specific first)

Configuration is passed in arguments to the constructor, as described below

ARGUMENTS

A list of <field> <regex> pairs providing by-URL configuration. The fields take the following form

Default (Allow|Deny)
Set the default action to be either allow, or deny.
Allow <regex>
Any URLs matching this regular expression will be allowed for parsing.
Deny <regex>
Any URLs matching this regular expression will be denied

Implementation note: For speed, the arguments are translated into a Perl anonymous subroutine, so errors in arguments file may give rise to odd Perl errors.