Expanding links in a simple tag system.
Many sites, in an effort to not allow raw HTML to be used, use their own tag system. One such site, http://www.perlmonks.org/, allows some HTML syntax, but uses square brackets ("[" and "]") to simplify the creation of links.
Some links have a specific protocol assigned to them ("[id://100]"), while
others do not ("[japhy]"). In addition there can be text attached to a link
("[id://100|this article]", "[japhy|me]"). These are supposed to be rendered
as HTML <A> tags.
[japhy|I]'ve got a [id://500|good article] about the [unlink] function.
<a href="/index.pl?node=japhy">I</a>'ve got a <a href="/index.pl?node_id=500">good article</a> about the <a href="/index.pl?node=unlink">unlink</a> function.
Our goal is to match a tag, which is from an opening bracket to the first closing bracket. We don't need to worry about nesting, because that concept simply doesn't exist here.
The definition of a "protocol" for the tag is word characters followed by a colon and two slashes, much like the "http://" and "ftp://" you are probably familiar with. If there is no protocol for a tag, we use a default link (the "/index.pl?node=" link); there is a protocol, we'll use a hash to determine what link to use (the "id" protocol uses "/index.pl?node_id=").
First, we build a hash of protocol-to-link:
my %HOW = (
id => "/index.pl?node_id=",
http => "http://",
https => "https://",
google => "http://www.google.com/search?q=",
# ...
DEF => "/index.pl?node_id=",
);
For readability's sake, let's construct a few regexes:
my $proto = qr/\w+/; # word chars my $content = qr/[^]|]+/; # non-], non-| my $text = qr/[^]]+/; # non-]
Now, we can do one substitution or two; if we do two, we need to look for tags with protocols first:
$chunk =~ s{
\[
($proto) :// ($content)
(?: \| ($text) )?
\]
}{
defined($3) ?
qq{<a href="$HOW{$1}$2">$3</a>} :
qq{<a href="$HOW{$1}$2">$2</a>};
}egx;
The second regex takes care of plain tags:
$chunk =~ s{
\[
($content)
(?: \| ($text) )?
\]
}{
defined($2) ?
qq{<a href="$HOW{DEFAULT}$1">$2</a>} :
qq{<a href="$HOW{DEFAULT}$1">$1</a>};
}egx;
The single regex approach is not terrible-looking, though.
$chunk =~ s{
\[
(?: ($proto) :// )? ($content)
(?: \| ($text) )?
\]
}{
my $p = defined($1) ? $1 : "DEFAULT";
defined($3) ?
qq{<a href="$HOW{$p}$2">$3</a>} :
qq{<a href="$HOW{$p}$2">$2</a>};
}egx;
The benefit of using variables containing patterns is clearly seen here. We have used aptly named variables in place of confusing patterns.
Other approaches.
jeffa at http://www.perlmonks.org/