Coding with Style
Jeff Pinyan
"List" Is a Four-Letter Word
Email comments to
japhy@pobox.com
Much confusion can arise from the concepts of lists and arrays in Perl. Add
the concepts of scalar context and list context, and one can become
thoroughly befuddled. Now you can learn how to properly wield these words,
and use lists effectively.
A Word on Context
In Perl, the word "context" refers to the settings that an expression is in.
There are three contexts, void, scalar, and list contexts,
of which only the last two are relevant to the content of this article.
Rick Delaney suggested an analogy of "context" be made to another language,
such as English. This analogy will come in handy in the text to follow.
"Rules" -- alone, without context, that word is meaningless. However, "To
understand Perl, you must understand its rules," and "Perl rules!" show the
difference in context; in one, "rules" is a noun, and in the other, it is a
verb.
Scalar
Scalar context is invoked when Perl is expecting to return a single
value. The following situations invoke scalar context on the expressions in
bold.
$scalar = $otherscalar;
$scalar = @array;
$scalar = (EXPR1, EXPR2, EXPR3);
$scalar = subroutine();
$scalar = <FH>;
$scalar = scalar(@array);
my $scalar = $otherscalar;
my $scalar = @array;
my $scalar = (EXPR1, EXPR2, EXPR3);
my $scalar = subroutine();
my $scalar = <FH>;
my $scalar = scalar(@array);
if ($scalar) { ... } # or elsif or unless
if (@array) { ... }
if (EXPR1, EXPR2, EXPR3) { ... }
if (subroutine()) { ... }
while ($scalar) { ... } # or until
while (@array) { ... }
while (EXPR1, EXPR2, EXPR3) { ... }
while (subroutine()) { ... }
$scalar = $string =~ /regex/;
my $scalar = $string =~ /regex/;
$array[0] = @otherarray;
$array[0] = <FH>;
$hash{key} = $scalar;
$scalar + @array;
Note: EXPRn refers to any Perl expression
Whenever a value is tested for falsehood or truth, it is evaluated in scalar
context.
Arrays return their length -- the number of elements they hold -- when called
in scalar context. Comma-separated series of expressions, such as
('a', 'b', 'c')
# or
$x++, abs $foo, @a;
in scalar context are discussed below, in the section discussing the
comma operator. Hashes return a string that
represents the number of "buckets" that are being used, out of how many are
allocated. Functions called in scalar context apply the scalar context to
their return values.
List
List context is invoked when Perl is expecting to return any number of
values, whether 0, 1, or more. The following situations invoke list context
on the expressions in bold.
($scalar) = $otherscalar;
($scalar) = @array;
($scalar) = (EXPR1, EXPR2, EXPR3);
($scalar) = subroutine();
($scalar) = <FH>;
my ($scalar) = $otherscalar;
my ($scalar) = @array;
my ($scalar) = (EXPR1, EXPR2, EXPR3);
my ($scalar) = subroutine();
my ($scalar) = <FH>;
for ($scalar) { ... }
for (@array) { ... }
for (EXPR1, EXPR2, EXPR3) { ... }
for (subroutine()) { ... }
print $scalar;
print @array;
print EXPR1, EXPR2, EXPR3;
print subroutine();
($scalar) = $string =~ /regex/;
my ($scalar) = $string =~ /regex/;
@array = @otherarray;
@array = $scalar;
@array[0] = @otherarray;
@array = <FH>;
@array[0] = <FH>;
@hash{key1,key2} = ($scalar1,$scalar2);
%hash = qw( key value key2 value2 );
push @array, function();
If you are confused by the difference between $array[0] and
@array[0], don't worry. You will soon learn the definition of the
term slice, and you will learn that @array[0] is syntactically
equal to ($array[0]). As you can see, parentheses around a scalar
or group of scalars on the left-hand side of the assignment operator invoke
list context.
Arrays in list context return their elements in order. Comma-separated series
of expressions in list context return their values in order as well. Hashes
in list context return a seemingly unordered list of their key-value pairs.
Functions in list context will invoke list context on their return values.
The Comma Operator
Note: there is no such thing as a list in scalar context. When this
appears to be the case, the comma operator is put to work.
When Perl sees a comma-separated series of expressions inside parentheses,
in scalar context, Perl employs the comma operator. This interesting
operator evalutes its left-hand operand, discards the results, and returns
its right-hand operand. Thus, only the final expression is returned for use.
Notice how parentheses play a very important role when dealing with lists and
comma-separated expressions:
$scalar = ('a', 'b', 'c'); # scalar; $scalar = 'c'
($scalar) = ('a', 'b', 'c'); # list; $scalar = 'a'
$scalar = 'a', 'b', 'c'; # scalar; $scalar = 'a'
@array = ('a', 'b', 'c'); # list; @array = ('a', 'b', 'c')
@array = 'a', 'b', 'c'; # scalar; @array = ('a')
In the first case, we can tell that scalar context is being applied. We know
how the comma operator works -- thus, the value 'c' is returned, and
$scalar is set equal to it. Look below, regarding
case five, for a warning message you would get if you use the -w switch
to perl.
In the second case, there is list context, and each value on the right is
assigned to the variable in the same position on the left-hand side:
($a,$b,$c) = ('alpha','beta','gamma');
sets $a to 'alpha', $b to 'beta', and
$c to 'gamma'. In the following example, though, only the
first value of the list is saved to a variable, and the others are thrown out.
It is important to know, though, although they are "thrown out", they are still
evaluated. Each of the three variables here gets incremented, although only
the last one is saved to another variable:
$incremented_z = (++$x, ++$y, ++$z);
In the third case, the assignment operator (=) binds more tightly
than the comma operator, and you end up with a series of expressions, namely:
$scalar = 'a', and 'b', and 'c'.
See below, regarding case five, about a warning message
you'll get if you use the -w switch to perl.
In the fourth case, the array on the left hand side calls for list context,
and so the array is cleared, and is given the values of the list as its
elements. It is important to realize that:
@array = ('a', 'b', 'c');
and
($array[0], $array[1], $array[2]) = ('a', 'b', 'c');
do not do the same thing. The first clears the array, and gives it
three elements with the given values; the second changes the values of the
first three elements of the array, and leaves the rest alone.
In the fifth case, we would get a warning from Perl if we were using the
-w switch to perl. This would also happen in the first and third
examples.
@array = "MIDN", "4/C", "PINYAN";
would make Perl alert you with the messages:
Useless use of a constant in void context at program.pl line 3.
Useless use of a constant in void context at program.pl line 3.
This is because the assignment operator doesn't care you're assigning to an
array, and it has greater precedence than the comma operator. Here,
@array would be cleared, and its first element would be given the
value "MIDN", and the other two values would spawn the warning messages shown
above.
Lists vs. Arrays
Now that you've been thoroughly bombarded with lists and arrays and contexts,
it's time we nailed down the difference between an array and a list. It's all
in the name -- literally. An array is a list with a name. Because arrays and
lists differ in this way, Perl treats them differently (good thing). The major
differences between the two are:
- functions like push(), pop(), and splice()
can only be used on arrays, because they change the length of the array
- arrays return the number of elements in scalar context
- lists do not exist in scalar context
- the number of values in a list can only be determined by iterating over
its values, or converting it to an array
- technical: lists only exist on Perl's internal stack, whereas
arrays -- both named and anonymous -- are stored in memory allocated in the
heap
As for their similarities:
- functions like reverse() and sort() work on both
- slices can be done on both
Slices
Earlier, it was explained that @array[0] was a slice, and was not
exactly the same as $array[0]. An array slice is indicated by a
leading @ on the array name, followed by one or more expressions in
brackets:
@array[0,1,2] = ($a,$b,$c);
An array slice is a shorthand format for referring to a list made up of
individual elements of the array:
($array[0], $array[1], $array[2]) = ($a,$b,$c);
Thus, the following lines do two very different things:
$line[0] = ;
@line[0] = ;
The <FH> operator returns a single line in scalar context, and
a list of all the lines from its current position to the end of the file in
list context. The second line can be rewritten as:
($line[0]) = ;
Which calls the <FH> operator in list context, which means the
entire contents of FILE are read, and the first value returned is
stored in $line[0], while all others are discarded -- their values not
used.
Slices work on list as well as arrays. Instead of writing something as
hideous as:
sub year {
my ($s,$m,$h,$D,$M,$Y,$wday,$yday,$tz) = localtime;
return $Y + 1900;
}
one could simply use:
sub year {
my $Y = (localtime)[5];
return $Y + 1900;
}
The localtime() function returns a string when called in scalar
context, but a list of values in list context. Only the element with index 5
is of any interest to us. To take a slice from the values returned by a
function, the function call must be placed in parentheses, and then the
subscript follows outside the parentheses, as is shown above. List slices
of one element are no different in syntax than list slices of multiple elements
because there is no leading symbol for a list:
$scalar = (localtime)[5]; # gets year
$scalar = (localtime)[0,3,5]; # still gets year
($day,$year) = (localtime)[3,5]; # $day gets day, $year gets year
The second line demonstrates what happens when you have a slice in scalar
context. It can be explained with the following expansion:
$foo = ('a', 'b', 'c')[0,2];
$foo = ('a', 'c');
$foo = 'c';
With arrays, the leading symbol changes when doing list slices, or fetching
individual elements:
$array[0] = "foo"; # first element is set to "foo"
$array[0,4] = "foo"; # fifth element is set to "foo"
@array[0,4] = ("a","b"); # first is "a", fifth is "b"
@array[0,4] = "a", "b"; # first is "a", fifth is undef!
Be careful to remember your parentheses, as shown by the last line of the
example!
Slices can be done on any list, not just one returned from a function:
$day = ("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")[$num];
@primes = (1, 3, 5, 7, 9)[1..3];
The .. operator here, shown in list context, is merely a range
operator, and saves us from printing all the numbers in a numerical range.
An understandable explanation of why the leading symbol changes is that "the
symbol tells you what you're getting back, not what you're working with". A
leading @ indicates a list being returned, while a leading $
indicates a scalar value. That is why hash slices look like
@hash{key1, key2} = ("value1", "value2");
While we are discussing hashes, it is important to know that Perl supported
multi-dimensional hashes using the syntax:
$hash{level1, level2, level3} = "value";
For backward compatability, a comma-separated series of expressions inside a
hash subscript for a scalar does NOT employ the comma operator -- instead, the
expressions are converted to a scalar via join($;, LIST), where
$; is the subscript separator variable, and LIST is
the series of expressions (that becomes "keys"). Please note that
$hash{@array} evaluates @array in scalar context, it does
not expand it to its values. This is consistent with the rules of arrays.
Return Values
The context a subroutine is called in is applied to the values it is to
return. Take these simple subroutines:
sub A {
return ('a', 'b', 'c');
}
sub B {
my @array = ('a', 'b', 'c');
return @array;
}
sub C {
my @array = ('a', 'b', 'c');
return @array[0..$#array];
}
With these three functions, there is no way, just by looking at them, of being
sure what they are going to return. A() might return three scalar
values, or if it is called in scalar context, the comma operator will act on
the series of comma-separated values, ('a', 'b', 'c'), and only
'c' will be returned.
The $#array variable refers to the last index of @array,
which is documented in perldata. Let us now examine the return values
of these functions:
$a = A();
($b) = A();
$c = B();
($d) = B();
$e = C();
($f) = C();
The A() function, in scalar context, has the comma operator act on
the comma separated series of values, and so $a is set to 'c'
which makes sense. $b, however, gets the value 'a', because
it invokes list context on the function, which returns a list.
The B() function returns an array. Thus, $c invokes scalar
context on an array, and thus gets the value 3, the length of the
array. We know, then, that $d gets the value 'a' because it
invokes list context.
The final pair is not as difficult as it may seem to be -- the function
returns an array slice. Thus, C() behaves exactly like A().
Anonymous Array
This next section assumes you have some knowledge of references in Perl,
specifically array references.
When creating a reference to an anonymous array, the question is raised as to
whether an array without a name is a list. Remember, an array is a named list,
so shouldn't an unnamed array be a list?
"N-n-not exactly."
Just because an array is anonymous does not mean it is stripped of its
properties:
$aref = [ qw( an array here ) ];
push @$aref, "not", "a", "list";
Clearly, @$aref is the array referenced by $aref, which
gives it all it needs, in terms of a "name". Anonymous array references are
as much arrays as regular arrays. Think if an anonymous array as having a
name that only Perl can pronounce. :)
Resources
Many functions in Perl act differently when called in list context from when
they are called in scalar context. Read the perlfunc documentation on
the specific function you are interested in. This goes for operators as well,
like pattern matching. Read perlop. Some examples:
($day,$month,$year) = (localtime)[3..5];
$date_string = localtime;
$found = $string =~ /Jeff/;
@integers = $string =~ /(\d+)/g;
If you really want to know about old multi-dimensional hashes, read the
perlvar documentation for the $; variable, and its use.
The perldata documentation discusses arrays and lists and contexts, as
well as gives a formal introduction of arrays.