NO, THAT'S WRONG! General Data Syntax Scalars vs. Slices Stringification Octal vs. Decimal Confusing Perl, C, and sh Command-line arguments Program Name Control Structures Storing Command Output Idiomatic Perl Operating on Arrays Handling Nested Quotes Comparison Operators
Common Perl Pitfalls
Let's say you want to access a value in an array or a hash. After skimming
--– a poor choie --– Chapter 2 of Programming Perl (O'Reilly, 1997)
-- a good choice --– you see that arrays start with a @ and hashes
start with a %. You have a snippet of code that looks something
like this:
@names = ("Jeff", "Jon", "Andrea", "Chuck");
$dad = @names[3];
Stop right there! We have a problem that is caught when using Perl's
-w switch: "Scalar value @names[3] better
written as $names[3]..." Although the problem may not be as
apparent in this case, the following code should clear it up considerably.
If you are storing output from a command in a scalar variable, like so:
$command_output[0] = `who`;
the output is stored as a string with newlines. If, however, you store it like this:
@command_output[0] = `who`;
you only get the first line of the output. The reason is this:
@array[...] denotes an array slice, whereas
$array[$index] denotes a scalar value. The incorrect code can
be shown more clearly like so:
@command_output[0] = `who`;
($command_output[0]) = `who`;
The two examples are one in the same; @array[$index] is a list
of one scalar, which is rarely what is intended. Likewise, to refer to many
hash elements at once, use @hash{$key1,$key2}, and not
%hash{$key1,$key2}. To refer to a single hash element, use a
$ to indicate a scalar: $hash{$key1}. Again, this
is all explained, in much more detail, in Chapter 2 of Programming
Perl, and it should be on your system in perldoc perlsyn.
A common mistake by inexperienced programmers occurs because they feel compelled to stringify variables in all cases.
print "$var";
$foo = "$bar";
function("$value");
All of these forces the variable to be a string, even if it is a number or
a reference. This causes big problems if the variable is a reference,
because a string that holds the memory address of data cannot be used to
retrieve the data; that is, you cannot de-reference a variable
$foo containing the text SCALAR(0xb5a3c) by using
$$foo. Therefore, if function() expects a reference
passed to it, and $value is a reference, that last line of code
above will break the subroutine.
The magical auto-increment operator has a different effect on strings and numbers. Consider the following example; it shows what happens when a string with letters and numbers gets magically incremented, and what happens when a number (even an octal or hexidecimal number) gets magically incremented.
$val = "0x123456"; # oops, meant it to be a hex number
print ++$val; # strips everything after the 'x'
1
$val = 0x123456; # now THAT'S hexidecimal
print ++$val; # prints the decimal representation
1193047
Many of the Perl's functions make system calls, and expect a file
permission status – chmod, mkdir,
umask – and these all expect the file permissions in octal,
which means they need a leading "0". The problem is that many
people do not enter the leading "0" when running chmod on their
own system, so they do not expect to need one in Perl; even worse, some
people are not aware what the bits in "755" – or more properly,
"0775" – signify. I suggest you consult your system's man page
on chmod(1) if you are one of these unlucky people.
Again, this is one of those situations where the -w switch will save you must frustration. The following code will produce an error with -w on:
chmod(644, $file);
chmod: mode argument is missing initial 0
Look at the permissions of the file here before and after that code is run on it:
-rw-rw-r-- 1 190 Jan 30 15:44 1.txt
--w--wx--- 1 190 Jan 30 15:45 1.txt*
Obviously, we meant 0664, not 664. You can turn
644 into 0664 very easily. Simply use the
oct() function. These two lines do the same thing:
chmod(0664, $file);
chmod(oct(664), $file);
On a side note, the function oct() assumes its argument is octal
(or hexidecimal, if it starts with a 0x)
and returns the corresponding value. The function hex() assumes
its argument is hexidecimal, and returns the corresponding value. Caveat:
oct(0664) is not equal to "0664". For more
information, refer to perlfunc.
There is also a module, available on the Comprehensive Perl Archive Network
(CPAN), called File::chmod. It allows for symbolic file permissions
as well as ls style permissions. Instead of extracting the
permissions of a file and modifying them bitwise, or making a system call,
you can append to its permissions or remove from them. The regular
chmod function in Perl requires an absolute file permission;
you can tell it (simply) to merely add the executable bit to a file. The
File::chmod module overrides the regular chmod with
its own, which can handle octal, symbolic, or ls style permissions.
A common mistake made by programmers of C and sh is that they don't partition their brains correctly, and some of their C or sh knowledge slips into the Perl part of their gray matter, and they confuse the languages. That, or they just assume Perl acts in a similar fashion. Perl has taken from many languages, yet it does have its differences from them.
In my experience, I have seen more people confuse C and Perl syntax for
arguments more than I have seen people confuse sh and Perl, but I shall
address them both. Perl stores its command-line arguments – minus those
parsed as command-line options using either the -s with
Perl or a module such as GetOpts – in the array @ARGV.
This array is accessible by all packages; that is, in any package,
@ARGV is the same as @main::ARGV. The first
argument is index 0, the last is index $#ARGV:
@ARGV[0..$#ARGC] is an array slice containing all the elements
in @ARGV. In C, the arguments are stored in argv[],
but the first element in the list is the name of the program. C uses
argc to hold the number of arguments passed to the program, so
then argv[1,2,3,...,argc] holds all the arguments to the program.
In sh, arguments are stored in $1, $2,
$3, ... which can cause problems when you get up to more than 9
arguments, but I'm not here to bash shell :). Perl uses those
variables for storing matching strings in a regular expression.
In perlvar (which should be installed on your system along with Perl,
unless your negligent system administrator has been remiss in his duties),
the variable $0 is listed as holding the name of the currently
running Perl program; for those of you accustomed to using the
English module, it's called $PROGRAM_NAME. The mnemonic
for the variable is, oddly enough, "same as in sh or ksh." In C, however, as
mentioned above, uses the first element in the argv array,
argv[0], to store the program name. Many times, in Perl programs,
I've seen people using $ARGV[0] or $ARGV when they
should have been using $0. $ARGV is a totally
different variable dealing with the name of the current file when reading
from <>; see perldoc perlvar for more information.
You can always tell if a person is stuck on C, because they'll ask a Perl
programmer how to do a switch statement. There is information in perldoc
perlsyn, and Tom Christiansen has a response to the question at
http://mox.perl.com/misc/fmswitch. There are multiple ways of
creating a switch-like control structure; using for-loop, if-elsif-else
statements... there are more, but I often end up using a for-loop.
Speaking of if-elsif-else statements, there are different syntaxes among the three languages here. Not to mention, in C one can leave braces off a one-line if statement, which the author finds ghastly wretched. In Perl, the statement is "elsif", in C, the statement is "else if" (two separate words), and in sh, the statement is "elif" (which is "file" spelled backwords).
Perl has a couple ways of calling system commands, and these are often sources of confusion for inexperienced programmers. There are several different ways to capture command output, each of them acts differently or returns data differently.
The system() function takes a list or a string and executes it,
printing to STDOUT whatever is sent the specified command's
STDOUT. It does not return what it prints, it only prints it.
It returns the return value of the system call, zero for success, non-zero
for failure. This example code shows you how not to get the date from your
system.
$date = system("/usr/bin/date")
or die "can't run /usr/bin/date: $!";
What that just did was assign 0 (hopefully) to $date – either 0
or whatever the return value of /usr/bin/date was – and then die
because system returned 0. The more correct (or less wrong) way of getting
the date from the system (if you really want to make a system call) is:
chomp($date = `/usr/bin/date`) or die "can't run /usr/bin/date: $!";
The backticks cause the program to return the standard output (with
newlines included) to a variable. The qx() operator is
identical to backticks. Using backticks in scalar form is slightly different
from using it in list form. In scalar form, multiline input is stored as a
single string of text, with newlines at the end of each line. In list form,
it returns a list of lines, sensitive to the $INPUT_LINE_SEPARATOR,
or $/, variable. List form is similar to:
open DATE, "/usr/bin/date |" or die "can't run /usr/bin/date: $!";
@data = <DATE>;
close DATE;
Of course, you might just want to use localtime() in scalar
context, except that it doesn't report the time zone you're in.
Finally, there's exec(). I see this used much too often, causing
problems that inexperienced programmers don't expect. Server programs most
frequently use this; it replaces the current program with what is passed to
it. It will end your program, so the following code is rather silly. The only
way the print statement would be called is if the exec failed,
which is a bad thing.
$date = exec "/usr/bin/date";
print "Today's date is: $date";
Another way to tell if someone's been programming in C and hasn't read up much on Perl is to look at how they do things to array elements. In C, you'll often see code that looks something like:
for (int i = 0; i < sizeof(array); i++){
char c = array[i];
// et cetera
}
And then they bring that over to Perl, and you get something like:
$size = @array;
for ($i = 0; $i < $size; $i++){
$element = $array[$i]; # or even worse, @array[$i]
# et cetera
}
Perl has a very nice way of iterating over lists. You can use
for or foreach, which happen to be the same
thing. It allows you to shrink your code (and the number of variables you
use) amazingly:
for (@array){ ... } # or
foreach (@array){ ... } # or
for $element (@array){ ... } # or
foreach $element (@array){ ... }
You see? It's that easy.
"He screamed, 'It's him! He told me, "I'll kill Sara.""'
What a troublesome thing to have to store in a variable, eh? Here's a shoddy attempt at storing that phrase in a variable:
$line = "\"He screamed, 'It's him! He told me, \"I'll kill Sara.\"'";
Now, if that isn't hideous, I'm not sure what is. Let's use Perl's
qq() operator to make things nice.
$line = qq("He screamed, 'It's him! He told me, "I'll kill $name."'");
The qq() operator works like double quotes, only you can use any
non-alphanumeric delimiter you want. Like double quotes, it interpolates
variables and escape sequences. The q() operator acts like
single quotes. The qx() operator, described previously, is the
same as using backticks. The qw() allows for speedy creation
of lists. The two lines of code are equivelent:
@list = qw(jonathan jeffrey jennifer andrea);
@list = split ' ', q(jonathan jeffrey jennifer andrea);
Notice that it splits on ' ', which is a magical string in
split() that splits on as much whitespace as possible, and
removes leading and trailing whitespace. Also, qw() does not
interpolate variables and escape sequences.
Please note a "quirk" about the qw() operator, shown to me not
too long ago on #perl. It does not imply parentheses around
itself, causing an unexpected error message in the following:
$word = qw( this that the other thing)[$i];
Can't use subscript on split at - line 1, near "2]"
$word = (qw( this that the other thing))[$i]; # properly done
It's a shame when novice programmers ruin flat files or databases by doing the following erroneous "comparison":
open FILE, "file" or die "can't open file: $!";
open OUT, ">file.out" or die "can't create file.out: $!";
while (<FILE>){
print OUT unless $_ == $dont_print_this_line;
# or even unless $_ = $dont_print_this_line;
}
close FILE;
close OUT;
rename "file.out" => "file" or die "can't mv file.out to file: $!";
Oh dear. Both of those tests will most likely screw up that file of yours.
The problem is this: the == operator is for numeric values only,
whereas the eq operator is for variables to be treated as strings.
Many errors in conditional statements arise from programmers using = when
they mean ==. The difference: if ($a = getword()){ ... } means
"if the return value of getword(), stored in $a, is true (non-zero), then...";
if ($a == getword()){ ... } means "if $a is the same value as
getword(), then...".
But chances are, that's not what you really meant. The function getword()
probably returns a word, not an numeric value. A string in numeric context
usually returns 0. Thus, that == comparison will probably only be true if
$a is 0. If, instead, $a is a word, and you want to test if $a is the same
as the return value of getword(), use the eq operator: if ($a eq
getword()){ ... }. The equivalent string operators for numeric operators
are:
== eq != ne > gt < lt >= ge <= le <=> cmp
Comparisons are done ASCIIbetically, meaning "This" comes before "this", and "hello" comes after "goodbye".