Coming soon, a regex-related page.
For now, the dissection of the regex at the end of my
Summer '04 TPJ article.
my $str = q;
my %data = $str =~ m{
# sets $^R to [ "", "", "" ]
# $^R->[0] is what we match before we capture
# $^R->[1] is what we want to capture
# $^R->[2] is what we match after we capture
(?{ [ "", "", "" ] })
# captures to $1 a key of some sort
# for our sample string, it matches
# 'this', 'here', and 'where'
([^\s=]+)
# then the =
\s* = \s*
# if we can look ahead for a quote...
(?(?=["'])
# then execute this code:
(?{
# get the next character (a " or a ')
my $c = substr($_, $+[0], 1);
# set $^R to [ quote, "[^quote]", quote ]
# where 'quote' is either " or '
[ $c, "[^\Q$c\E]*", $c ]
})
# else...
|
# set $^R to [ "", '\S+', "" ]
(?{ [ "", '\S+', "" ] })
)
# match the regex in $^R->[0],
# capture the regex in $^R->[1],
# and match the regex in $^R->[2]
(??{ $^R->[0] })
((??{ $^R->[1] }))
(??{ $^R->[2] })
}xg;
print "$_ = <$data{$_}>" for keys %data;
What this regex does is match 'key=value' pairs, where the value might be
quoted with "..." or '...'. If the value is quoted, we do NOT want
to capture the quotes. This regex gets around the problem of having too
many captures in your regex when you really want one piece of data:
my $quoted = qr{
([^\s=]+) \s* = \s*
(?:
' ([^']*) '
| " ([^"]*) "
| (\S+)
)
}x
The problem with that regex is it creates $1 through $4,
and you need to use $+ to get the one that matched. And you can
only do that if you match one at a time:
while ($str =~ /$quoted/g) {
$data{$1} = $+;
}
My absurd regex gets around this by matching the quotes outside the capture
group if they're going to exist.