by Jeff "japhy" Pinyan
Files and Filehandles
Note: this article deals with files and filehandles; getting the
output of programs, and using directories and dirhandles will be covered in
a later article.
Email comments to
japhy@pobox.com
The Basics
While many people know what files and filehandles are in Perl, there are
many times that people forget exactly how they should be used. Too often
people open files incorrectly, causing data to be lost, or they expect a
file to exist, when it really doesn't, and their program lacks the proper
error reporting to alert them. Let's fix these problems up first.
The open() Function
The most common way of opening files in Perl is to use the open()
function. Here are the most common ways files are opened:
open FH, "filename"; # for reading
open FH, "filename"; # create, for writing
open FH, ">>filename"; # for appending
Let's be sure we have some terms straight. "Reading" means that you can
get the contents of the file. "Writing" means that you are placing content
in the file. "Appending" means that you are writing, starting at the
end of the file; if the file does not exist, Perl attempts to create it
for you. "Create" means the file is brought into existence if it does not
exist, and clobbered (the contents are erased) if it does.
If you have a filename stored in a variable, and you're opening the file for
reading, it isn't necessary to put the variable in quotes:
open FOO, $file or die "can't open $file: $!";
And, while it isn't common practice, you can include the symbols at the
beginning of the value in the variable:
$file = ">>/tmp/foo";
open FOO, $file or die "can't append to $file: $!";
The reason this isn't suggested is because the symbols help you know exactly
what you're doing, and if you have 100 lines of code between the line where
you set the variable, and the line that you use it in the open()
function, you might not remember how the file is being opened. In addition,
it's best to leave a variable holding a filename to hold JUST a filename, or
you'd have to make adjustments every time you need to read the filename out of
the variable. Therefore, it is good practice to keep the symbols and the
filename separate:
open FOO, $foo; # you can tell instantly
open FOO, ">$foo"; # how the filename in $foo
open FOO, ">>$foo"; # is being used here
"Open this file or die!"
Error handling is very important when using files. You can never be too sure;
did you open the file successfully? If not, why? Perl's die()
function, and the $! variable can answer your questions:
$file = "/tmp/resutls.txt"; # NOTE the misspelling
open RESULTS, $file;
$result = ;
# you aren't sure if the file opened correctly
# unless you're using the -w switch to Perl:
# it will whine about "reading on unopened filehandle"
open RESULTS, $file or
die "can't open $file: $!";
$result = ;
# now you know, because Perl will complain
# with "can't open /tmp/resutls.txt: file not found..."
This remark on Perl's behalf will let you know that something is wrong with
your request to open a file. The general rule is that you should always
check the return value of system calls; open() is a system call. The
$! variable holds the value of the latest system error, and comes in
handy when die()ing.
Closing a File
To close a file, you simply use the close() function on the filehandle
that you used to open the file.
close FH;
close FH or die "can't close file: $!";
Calling close() is a system call as well, and it can't hurt to ensure
that a file was closed properly.
Reading
The <FH> Operator
It is usually not smart to slurp the contents of a file into an array; for
large files, this can use a large amount of memory. It is much more sound to
iterate on the contents of the file, using a while loop:
open FH, "file" or die $!;
@contents = ;
close FH;
foreach $element (@contents) {
# line is held in $element
chomp $element; # remove ending newline
}
# ...much better if written as...
open FH, "file" or die $!;
while () { # until end of file...
# line is held in $_
chomp; # if you don't want the ending newline
}
close FH;
Perl does not automatically remove the ending newline from a line when you get
it; more specifically, it does not remove the ending $/ at the end of
a line -- look below to learn about this variable and its usefulness. Use the
chomp() function to safely get rid of this ending sequence; while you
had to use chop() in Perl 4, Perl 5 has added this safer function. A
common mistake when using a while loop is skipping lines in the file,
like so:
while () { # this stores the line in $_
$line = ; # this put the NEXT line in $line
}
If you only use the $line variable there, you'll end up missing every
other line. What was meant here was one of the following:
while (defined($line = )) { ... }
while () { $line = $_; }
while (!eof(FH)) { $line = ; }
The first example there shows how while (<FH>) actually works: it
is the same as while (defined($_ = <FH>)). This is only
true when this is the ONLY statement in the while loop's condition.
The reason defined() is required here is to ensure a line consisting
of a 0 and nothing else (a rare case, but hey...) is still considered
a line. The third example uses the eof() function; this function is
three-fold in nature, but we will only discuss the eof(FH) usage here
(the rest will be explained in a future column, and you can read it on your own
in the perlfunc documentation (see the Resources
section at the end of the article)).
The <FH> notation returns either a single line of the file, if
used in scalar context, or the remaining lines in the file, if used in list
context:
$first = ;
$second = ;
@rest = ;
($first,$second,@rest) = ;
That final line does the same as the first three; because there's a list on the
left hand side, <FH> is called in list context. <FH>
returns false (specifically, undef if called in scalar context, and an
empty list in list context) upon reaching the end of the file, and the next
call will start from the beginning of the file. Because @rest =
<FH> is in list context, @rest does not have a final element
of undef.
The "End of Line" Variable, $/
When you read from a file using <FH>, you get the content from
your current position to the end of the "line"... but what denotes the end of
a line? The $/ variable, which defaults to \n, is what Perl
uses to determine if it's reached the end of a line. If you change the value,
Perl changes its definition of a line. Here's an example:
{
local $/ = "\n%%\n"; # why use local?
chomp($line = );
}
The "end of line" string \n%%\n is a common one used for signature
file quotes, as well as for the fortunes for the popular fortune
program found on many Unix boxes. Why do we use local() here, instead
of my()? Short answer is, we have to, because $/ is a
special Perl variable. Enclosing the code in a pair of braces as shown is a
way of ensuring $/ gets its original value back. Also, we can use
chomp() to remove the value of $/ from the end of a string.
There are two special values $/ can be set to: undef, and
"". They are not the same value, mind you. Setting it to
undef means that is no "end of line" marker, so using the
<FH> operator will return the entire file as one long string.
This is not as inefficient as you may think it would be; it is a fast, and
effective way to get the entire contents of a file into a string. The other
value, "", turns on "paragraph mode", meaning a "line" will be any
series of characters ended by two or more newline characters. In this special
case, chomp() removes all newlines at the end of the string. Please
note, however, that $/ is a string, and not a regular
expression. Setting it to "\n+" will make a line a string of
characters ending in a newline followed by a plus sign.
Writing
print() and select()
The print() function is rather simple one to use; the syntax is (says
the perlfunc manpage):
print LIST
print FILEHANDLE LIST
print
print FILEHANDLE
FILEHANDLE can either be a filehandle (FOO), or a variable
containing a reference to a filehandle, or a string containing the name of a
filehandle (that will be discussed in the second article on files and
filehandles). LIST is a regular list. If the LIST is
omitted, print() uses $_. If the FILEHANDLE is
omitted, print() defaults to STDOUT, or the filehandle
currently select()ed.
The one argument version of the select(FH) function makes the given
filehandle the default one; Perl programs start out as though you had said
select(STDOUT). This function returns the filehandle that is
currently select()ed:
print "This goes to STDOUT\n";
$oldfh = select(NEWFH);
print "This goes to NEWFH\n";
select($oldfh);
print "This goes to STDOUT\n";
Note: this example shows the use of a scalar in place of a filehandle. This
"magic" is explained in the next article on this topic, which will describe
more advanced file and filehandle operations.
Here-docs
As a programmer who's looked over other peoples' code, I must say one of the
ugliest things I've seen is the overuse of print statements. I see
gunk like:
print "Click Here!\n";
print "
\n";
print "Other Links
\n";
# etc...
There are a couple things I find unfavorable: the need to backslash "
everywhere, the multiple statements when ONE will do, and sometimes, the
programmer doesn't put any \n's in at all, and the output is very
messy to the eye. Since we know that print() can take a list, we
could say:
print "Come, listen to a story\n",
"About a man named Jed.\n",
"etc.\n";
But if we want to include quotes in there, as well as variables, single quotes
around the lines won't help: the \n's won't be interpolated, and
neither will the variables. We could use the qq() operator, which
allows for a different symbol than " to be used to delimit quoted
text:
print qq!You can't use a \! in here without
putting a backslash in front of it\n!;
But just like regular quotes, you need to backslash the quote character. To
get around this, we could use paired delimiters, like {}:
print qq{You can nest { these things } safely\n};
And as the example shows, pairs can be nested; the number of left and
right units of the pair must match. To make a hanging } or {
you'd need to backslash it. The final workaround is one I highly suggest, the
here-doc. Borrowed from sh, they have a rather simple syntax:
print <
Note the semicolon after the label on the print() statement!
You can also use backticks around the label, but that is seldom done. A very
important rule is that if you do not use quotes around the label, it must
immediately follow the <<. Another one is that the
closing label must be reproduced exactly as shown in the print
statement, on its own line, and that there must then be a newline after the
closing label:
print FH << " two leading spaces";
la dee da
two leading spaces
that line above was NOT a valid close to
this here-doc
two leading spaces
If you get an error like "Can't find string terminator "END TEXT" anywhere
before EOF at filename line nnn." then be sure you typed the
label the same way in the beginning and the ending. If they are the same, and
your ending label is on the last line of your file, be sure there is actually
a newline after that last line.
You can have multiple here-docs in one statement:
print HTML << "end header", << "end body";
"I can use quotes!"
end header
This is now in the 'end body' section.
end body
As an aside, here-docs can be used when passing arguments to functions, etc.:
makeHTML(<< "end of body", $title);
blah blah blah
end of body
$text = << 'EOF';
this is a multi-line string
placed into $text. and since
pressing enter makes a real newline,
I can make newlines while using
single quotes!
EOF
Resources
To read more on opening files, read perlopentut, available at http://language.perl.com/newdocs/pod/perlopentut.html.
The documentation on the functions mentioned here is all available in the
perlfunc section of the docs, or by typing perlfunc -f NAME at
your command prompt. $/ is documented in perlvar. Here-docs
are discussed in perldata. All this documentation is also found online
at http://language.perl.com/.