Mac OS X Unleashed

Mac OS X Unleashed

By John Ray and William C. Ray

Perl

Perl (Practical Extraction and Reporting Language) has grown from a cult following in the early 1990s to a massive hit today. Originally designed to make working with text data simple, Perl has been expanded by developers to handle tasks such as image manipulation and client/server activities. Because of its ease of use and capability to work with ambiguous user input, Perl is an extremely popular Web development language. For example, assume that you want to extract a phone number from an input string. A user might enter 555-5654, 5552231, 421-5552313, and so on. It is up to the application to find the area code, local exchange, and identifier numbers. In Perl, this is simple:

#!/usr/bin/perl
print "Please enter a phone number:";
$phone=<STDIN>;
$phone=~s/[^\d]//g;
$phone=~s/^1//;
if (length($phone)==7) {
    $phone=~/(\d{ 3,3} )(\d{ 4,4} )/;
    $area="???"; $prefix=$1; $number=$2;
} elsif (length($phone)==10) {
    $phone=~/(\d{ 3,3} )(\d{ 3,3} )(\d{ 4,4} )/;
    $area=$1; $prefix=$2; $number=$3;
} else {  print "Invalid number!"; exit; }
print "($area) $prefix-$number\n";

This program accepts a phone number as input, strips any unusual characters from it, removes a leading 1, if included, and then formats the result in an attractive manner.

Applying this capability to mine data from user input to Web development creates opportunities for programmers to make extremely user-friendly software.

Perl programs are similar to shell scripts in that they are interpreted by an additional piece of software. Each script starts with a line that includes the path to the Perl interpreter. In Mac OS X, this is typically #!/usr/bin/perl. Upon entering a script, it must be made executable by typing chmod +x <script name> . Finally, it can be run by entering its complete path at the command line, or by typing ./ <script name> from the same directory as the script. For more information on this process, please refer to Chapter 18, "Advanced Unix Shell Use: Configuration and Programming (Shell Scripting)."

Although this chapter provides enough information to write a program like the one shown here, it is not a complete reference to Perl. Perl is an object-oriented language with thousands of functions. Sams Teach Yourself Perl in 21 Days is an excellent read and a great way to beef up on the topic.

Variables and Data Types

Perl has a number of different variable types, but the most common are shown in Table 22.1. Perl variable names are comprised of alphanumeric characters and are case sensitive, unlike much of Mac OS X. This means that a variable named $mymacosx is entirely different from $myM a cOSX. Unlike some languages, such as C, Perl performs automatic type conversion when possible. A programmer can use a variable as a number in one statement, and a string in the next.

Table 22.1. Common Perl Variable Types

Type Description
$variable A simple variable that can hold anything is prefixed with a $. You can use these variables as strings or numbers. These are the most common variables.
FILEHANDLE Filehandles hold a reference to a file that you are writing or reading. Typically, these are expressed in uppercase and do not have the $ prefix.
@array The @ references an array of variables. The array does not need to be predimensioned and can grow to whatever size memory allows. You reference individual elements of an array as $a r ray[0], $array[1], $array[2], and so on. The array as a whole is referenced as @array.
%array This is another type of an array—an associative array. Associative arrays are another one of Perl's power features. Rather than using numbers to reference the values stored in this array, you use any string you'd like. For example, if you have 3 apples, 2 oranges, and 17 grapefruit, you could store these values in the associative array as $array{apple}=3, $array{orange}=2, $array{grapefruit}=17. The only difference between the use of a normal array and an associate array (besides the method of referencing a value) is the type of brackets used. Associative arrays use curly brackets {} to access individual elements, whereas standard arrays use square brackets [].

Input Output Functions

Because Perl is so useful for manipulating data, one of the first things you'll want to do is get data into a script. There are a number of ways to do this, including reading from a file or the Terminal window. To Perl, however, command-line input and file input are very much the same thing. To use either, you must read from an input stream.

Input Streams

To input data into a variable from a file, use $variable=< FILEHANDLE >. This will input data up to a newline character into the named variable. To read from the command line, the filehandle is replaced with a special handle that points to the standard input stream—<STDIN>.

When data is read from an input stream, it contains the end of line character (newline) as part of the data. This is usually an unwanted piece of information that can be stripped off using the chomp command. Failure to use chomp often results in debugging headaches as you attempt to figure out why your string comparison routines are failing. For example, the following reads a line from standard (command line) input and removes the trailing newline character:

$myname=<STDIN>;
chomp($myname);

To read data in from an actual stored file, it must first be opened with open <FILEHANDLE>, <filename> . For example, to read the first line of a file named M a cOSX.txt:

open FILEHANDLE, "MacOSX.txt";
$line1=<FILENAME>;
close FILEHANDLE;

When finished reading a file, use close followed by the filehandle to close.

Outputting Data

Outputting data is the job of the print command. print can display text strings or the contents of variables. In addition, you can embed special characters in a print statement that are otherwise unprintable. For example:

print "I love Mac OS X!\n----------------\n";

In this sample line, the \n is a newline character—this moves the cursor down a line so that subsequent output occurs on a new line, rather than the same line as the current print statement. Table 22.2 contains other common special characters.

Table 22.2. Common Special Characters

Escape Sequence Description
\n Newline, the Unix equivalent of return/enter
\r A standard return character
\t Tab
\" Double quotes
\\ The \ character

Many characters (such as ") have a special meaning in Perl; if you want to refer to them literally, you must prefix them with \—this is called escaping the character. In most cases, nonalphanumeric characters should be escaped just to be on the safe side.

File Output

To output data to a file rather than standard output, you must first open a file to receive the information. This is nearly identical to the open used to read data, except for one difference. When writing to a file, you must prefix the name of the file with one of two different character strings:

With a file open, the print command is again used for output. This time, however, it includes the filehandle of the output file. For example, this code saves Mac OS X to a file named MyOS.txt:

open MYFILE,"> MyOS.txt";
print MYFILE "Mac OS X\n";
close MYFILE;

Again, the close command is used to close the file when all output is complete.

External Results (``)

One of the more novel (and powerful) ways to get information into Perl is through an external program. For example, to quickly and easily grab a listing of running processes, you could use the output of the Unix ps axg command:

$processlist=`ps axg`;

The backtick (``) characters should be placed around the command of the output you want to capture. Perl will pause and wait for the external command to finish executing before it continues processing.

This is both a dangerous and powerful tool. You can easily read an entire file into a variable by using the cat command with backticks. Unfortunately, if the external program fails to execute correctly, the Perl script might hang indefinitely.

Expressions

Although Perl variables can hold numbers or strings, you still need to perform the appropriate type of comparison based on the values being compared. For example, numbers can be compared for equality using ==, but strings must be compared with eq. If you attempt to use == to compare two strings, the expression will evaluate to true because the numeric value of both strings is zero, regardless of the text they contain. Table 22.3 displays common Perl expressions.

Table 22.3. Use the Appropriate Comparison Operators for the Type of Data Being Compared

Expression Syntax Description
$var1==$var2 Compares two numbers for equality.
$var1!=$var2 Compares two numbers for inequality.
$var1<$var2 Checks $var1 to see whether it is less than $var2.
$var1>$var2 Tests $var1 to see whether it is a larger number than $var2.
$var1>=$var2 Tests $var1 to see whether it is greater than or equal to $var2.
$var1<=$var2 Compares $var1 to see whether it is less than or equal to $var2.
$var1 eq $var2 Checks two strings for equality.
$var1 ne $var2 Checks two strings for inequality.
$var1 lt $var2 Checks to see whether the string in $var1 is less than (by ASCII value) $var2.
$var1 gt $var2 Tests the string in $var1 to see whether it is greater than $var2.
() Parentheses can be used to group the elements of an expression together to force an evaluation order or provide clarity to the code.
&&/and Used to connect two expressions so that both must evaluate to true in order for the complete expression to be true.
||/or Used to connect two expressions so that if either evaluates to true, the entire expression will evaluate to true.
! Used to negate an expression. If the expression previously evaluated to true, you can place a ! in front of the expression to force it to evaluate false—or vice versa.

Regular Expressions

Regular expressions (REs) are a bit more interesting than the expressions in the preceding section. Like one of the previous expresses, REs evaluate to a true or false state. In addition, they are used to local and extract data from strings.

For example, assume that the variable $mycomputer contains the information My computer is a Mac.

To create a regular expression that would test the string for the presence of the word mac, you could write

$mycomputer=~/mac/i

Although this line might look like an assignment statement, it is in fact looking inside of the variable $mycomputer for the pattern mac. The pattern that a regular expression matches is contained within two / characters, unless changed by the programmer. The i after the expression tells Perl that it should perform a case-insensitive search, allowing it to match strings such as MAC and mAC.

To understand the power of regular expressions, you must first understand the pattern matching language that comprises them.

Patterns

Regular expressions are made up of groups of pattern matching symbols. These special characters symbolically represent the contents of a string and can be used to build complex pattern matching rules with relative ease. Table 22.4 contains the most common components of regular expressions and their purpose.

Table 22.4. Use These Pattern Matching Components to Build a Regular Expression

Pattern Purpose
$ Matches the end of a string.
^ Matches the beginning of a string.
. Matches any character in the string.
[] Matches any of the characters within the square brackets.
\s Matches any type of white space (space, tab, and so on).
\n Matches the newline character.
\t Matches the tab character.
\w Matches a word character.
\d Matches a digit.

The bracket characters enable you to clearly define the characters that you want to match if a predefined sequence doesn't already exist. For example, if you'd like to match only the uppercase letters A through Z and the numbers 1, 2, and 3, you could write

[A-Z123]

As seen in this example, you can represent a contiguous sequence of letters or numbers as a range—specifying the start and end characters of the range, separated by a –.

Pattern Repetition

With the capability to write patterns, you can match arbitrary strings within a character sequence. What's missing is the capability to match strings of varying lengths. These repetition characters modify the pattern they follow and enable it to be matched once, twice, or as many times as you'd like:

When a repetition sequence is followed by a ?, the pattern will match as few characters as possible to be considered true. For example, the following expression will match between 5 and 10 occurrences of the numbers 1, 2, or 3:

$testnumbers=~/[1-3]{ 5,10} /;

The capability to match an arbitrary number of characters enables the programmers to deal with information they might not be expecting.

Extracting Information from a Regular Expression

Although it's useful to be able to find strings that contain a certain pattern, it's even better if the matching data can be extracted and used. To extract pieces of information from a match, enclose the pattern with parentheses (). To see this in action, let's go back to the original telephone number program that introduced Perl in this chapter. One of the regular expressions extracted the parts of a 10-digit phone number from a string of 10 digits:

$phone=~/(\d{3,3})(\d{3,3})(\d{4,4})/;

There are three parts to the regular expression, each enclosed within parentheses. The first two (\d{3,3}) capture strings of three consecutive digits, and the third (\d{4,4}) captures the remaining four.

For each set of parentheses used in a pattern, a $# variable is created that corresponds to the order that the parentheses are found. Because the area code is the first set of parentheses in the example, it is $1, the local prefix is $2, and the final four digits are held in $3.

Search and Replace

Because you can easily find a pattern in a string, wouldn't it be nice if you could replace it with something else? Perl enables you to do just that by writing your regular expression line a little bit differently:

$a=~s/<search pattern>/<replace pattern>/

This simple change enables you to modify data in a variable so it is exactly what you're expecting—removing extraneous data. For example, to match a phone number in the variable $phone, and then change it to a standard format could be accomplished in a single step:

$phone=~s/(\d{3,3})(\d{3,3})(\d{4,4})/($1) $2-$3/;

A new string in the format (xxx) xxx-xxxx replaces the phone number found in the original string. This enables a programmer to modify data on the fly, transforming user input into a more useable form.

Regular expressions are not easy for many people to learn and a single misplaced character can trip them up. Don't feel bad if you're confused at first, just keep at it. An understanding of regular expressions is important in many languages and, if properly used, can be a very powerful development tool.

Flow Control

Flow control statements give Perl the capability to alter its execution and adapt to different conditions on-the-fly. Perl uses very standard C-like syntax for its looping and conditional constructs. If you've used C or Java before, these should all look very familiar.

if-then-else

Perl's if-then-else logic is very simple to understand. If a condition is met, a block of code is executed. If not, a different piece of programming is run. The syntax for this type of conditional statement is

if <expression> {
       <statements...>
} else {
       <statements...>
}

For example, to test whether the variable $mycomputer contains the string Mac OS X and print Good Choice! if it does, you could write:

if ($mycomputer=~/mac os x/i) {
        print "Good Choice!\n";
} else {
        print "Buy a Mac!\n";
}

The curly brackets {} are used to set off code blocks within Perl. These denote the portion of code that a conditional, looping, or subroutine construct applies to.

unless-then-else

The unless statement is syntactically identical to the if-then statement, except that it operates on the inverse of the expression (and uses the word unless rather than if). To change the previous example so that it uses unless, write

unless ($mycomputer=~/mac os x/i) {
        print "Buy a Mac!\n";
} else {
        print "Good Choice!\n";
}

The unless condition is rarely used in Perl applications and is provided mainly as a way to write code in a more readable manner.

while

The while loop enables you to execute while a condition remains true. At the start of each loop, an expression is evaluated; if it returns true, the loop executes. If not, it exits. The syntax for a Perl while loop is

while <expression> {
        <statements>
}

For example, to monitor a process listing every 30 seconds to see if the application Terminal is running, the following code fragment could be employed:

$processlist=`ps axg`;
while (!($processlist=~/terminal/i)) {
        print "Terminal has not been detected.\n";
        sleep 30;
        $processlist=`ps axg`;
}
print "The Terminal process is running.";

Here the output of the ps axg command is stored in $processlist. This is then searched using a regular expression in the while loop. If the pattern terminal is located, the loop will exit and the message The Terminal process is running. is displayed. If not, the script sleeps for 30 seconds, and then tries again.

for-next

The for-next loop is the bread and butter of all looping constructs. This loop iterates through a series of values until a condition (usually a numeric limit) is met. The syntax for a for-next loop is

for (<initialization>;<execution condition>;<increment>) {
        <code block>
}

The initialization sets up the loop and initializes the counter variable to its default state. The execution condition is checked with each iteration of the loop; if it evaluates to false, the loop ends. Finally, the increment is a piece of code that defines an operation performed on the counter variable each time the loop is run. For example, the following loop counts from 0 to 9:

for ($count=0;$count<10;$count++) {
        print "Count = $count";
}

The counter, $count, is set to 0 when the loop starts. With each repetition, it is incremented by 1 ($count++). The loop exits when the counter reaches 10 ($count<10).

Subroutines

Subroutines help modularize code by dividing it into smaller functional units. Rather than creating a gigantic block of Perl that does everything under the sun, you can create subroutines that are easier to read and debug.

A subroutine is started with the sub keyword and the name the subroutine should be called. The body of the subroutine is enclosed in curly brackets {}. For example, here is a simple subroutine that prints Mac OS X.

sub printos {
        print "Mac OS X\n";
}

You can include subroutines anywhere in your source code and call them at any time by prefixing their name with & (&printos). Subroutines can also be set up to receive values from the main program and return results. For example, this routine accepts two strings and concatenates them together (useful, huh?):

sub concatenatestring {
        my ($x,$y)=@_;
        return ("$x$y");
}

To retrieve the concatenation of the strings Mac and OS X, the subroutine would be addressed as

$result=&concatenatestring("Mac","OS X");

Data is received by the subroutine through the use of the special variable @_. The two values it contains are then stored in local variables (denoted by the my keyword) named $x and $y. Finally, the return statement returns a concatenated version of the two strings.

Additional Information

The information in this chapter should be enough to get you started authoring and editing Perl scripts. Later in the chapter, you'll learn how to extend Perl to another free software package—MySQL. In Chapter 28, "Web Programming," you'll see how Perl can be used to author online applications.

As with many topics in the book, the space just isn't available for a completely comprehensive text. If you like what you see, you can learn more about Perl through these resources:

Share ThisShare This

Informit Network