Home > Articles > Web Development > Perl

  • Print
  • + Share This
  • 💬 Discuss

Item 53. Consider different ways of reading from a stream.

You can use the line input operator <> to read either a single line from a stream in a scalar context or the entire contents of a stream in a list context. Which method you should use depends on your need for efficiency, access to the lines read, and other factors, like syntactic convenience.

In general, the line-at-a-time method is the most efficient in terms of time and memory. The implicit while (<>) form is equivalent in speed to the corresponding explicit code:

open my ($fh), '<', $file or die;

while (<$fh>) {
  # do something with $_
}
while ( defined( my $line = <$fh> ) ) { # explicit version
    # do something with $line
}

Note the use of the defined operator in the second loop. This prevents the loop from missing a line if the very last line of a file is the single character 0 with no terminating newline—not a likely occurrence, but it doesn't hurt to be careful.

You can use a similar syntax with a foreach loop to read the entire file into memory in a single operation:

foreach (<$fh>) {
  # do something with $_
}

The all-at-once method is slower and uses more memory than the line-at-a-time method. If all you want to do is step through the lines in a file, you should use the line-at-a-time method, although the difference in performance will not be noticeable if you are reading a short file.

All-at-once has its advantages, though, when combined with operations like sorting:

print sort <$fh>;  # print lines sorted

If you need access to more than one line at a time, all-at-once may be appropriate. If you want to look at previous or succeeding lines based on the current line, you want to already have those lines. This example prints three adjacent lines when it finds a line with "Shazam":

my @f = <$fh>;
foreach ( 0 .. $#f ) {
  if ( $f[$_] =~ /\bShazam\b/ ) {
    my $lo = ( $_ > 0 )   ? $_ - 1 : $_;
    my $hi = ( $_ < $#f ) ? $_ + 1 : $_;
    print map { "$_: $f[$_]" } $lo .. $hi;
  }
}

You can still handle many of these situations with line-at-a-time input, although your code will definitely be more complex:

my @fh;
@f[ 0 .. 2 ] = ("\n") x 3;
for ( ; ; ) {
  # queue using a slice assignment
  @f[ 0 .. 2 ] = ( @f[ 1, 2 ], scalar(<$fh>) );
  last if not defined $f[1];
  if ( $f[1] =~ /\bShazam\b/ ) {  # ... looking for Shazam
    print map { ( $_ + $. - 1 ) . ": $f[$_]" } 0 .. 2;
  }
}

Maintaining a queue of lines of text with slice assignments makes this slower than the equivalent all-at-once code, but this technique works for arbitrarily large input. The queue could also be implemented with an index variable rather than a slice assignment, which would result in more complex but faster running code.

Slurp a file

If your goal is simply to read a file into memory as quickly as possible, you might consider clearing the line separator character and reading the entire file as a single string. This will read the contents of a file or stream much faster than either of the earlier alternatives:

my $contents = do {
  local $/;
  open my ($fh1), '<', $file1 or die;
  <$fh>;
};

You can also just use the File::Slurp module to do it for you, which lets you read the entire file into a scalar to have it in one big chunk or read it into an array to have it line-by-line:

use File::Slurp;

my $text  = read_file('filename');
my @lines = read_file('filename');

Use read or sysread for maximum speed

Finally, the read and sysread operators are useful for quickly scanning a file if line boundaries are of no importance:

open my ($fh1), '<', $file1 or die;
open my ($fh2), '<', $file2 or die;

my $chunk = 4096;  # block size to read
my ( $bytes, $buf1, $buf2, $diff );

CHUNK: while ( $bytes = sysread $fh1, $buf1, $chunk ) {
  sysread $fh2, $buf2, $chunk;
  $diff++, last CHUNK if $buf1 ne $buf2;
}

print "$file1 and $file2 differ" if $diff;

Things to remember

  • Avoid reading entire files into memory if you don't need to.
  • Read entire files quickly with File::Slurp.
  • Use read of sysread to quickly read through a file.
  • + Share This
  • 🔖 Save To Your Account

Discussions

comments powered by Disqus