Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 42 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
42
Dung lượng
437,34 KB
Nội dung
210 CHAPTER 7 BUILT-IN FUNCTIONS The double quotes around the argument are processed first, forming a string from the space-separated list elements; then, the list context provided by the function is applied to that result. But a quoted string is a scalar, and list context doesn’t affect scalars, so the existing string is left unmodified as print’s argument. The join function listed in table 7.1 provides the same service as the combination of ‘ $"’ and double quotes and is provided as a convenience for those who prefer to pass arguments to a function rather than to set a variable and double quote a string. We’ll discuss this function later in this chapter. Now you understand the basic principles of evaluation context and the tools used for converting data types. With this background in mind, we’ll examine some important Perl functions that deal with scalar data next, such as split. Then, in section 7.3 we’ll discuss functions that deal with list data, such as join. 7. 2 P ROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS Table 7.2 describes some especially useful built-in functions that generate or process scalar values, which weren’t already discussed in part 1. Table 7.2 Useful Perl functions for scalars, and their nearest relatives in Unix Perl built-in function Unix relative(s) Purpose Effects split The cut command; AWK’s split function; the Shell’s IFS variable Converting scalars to lists Takes a string and optionally a set of delimiters, and extracts and returns the delimited substrings.The default delimiter is any sequence of whitespace characters. localtime The date command Accessing current date and time Returns a string that resembles the output of the Unix date command. stat lstat The ls –lL command The ls -l command Accessing file information Provides information about the file referred to by stat’s argument, or the symbolic link presented as lstat’s argument. chomp N/A Removing newlines in data Removes trailing input record separators from strings, using newline as the default. (With Unix utilities and Shell built-in commands, newlines are always removed automatically.) rand The Shell’s RANDOM variable; AWK’s rand function Generating random numbers Generates random numbers that can be used for decision-making in simulations, games, etc. PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 211 The counterparts to those functions found in Unix or the Shell are also indicated in the table. These provide related services, but in ways that are generally not as conve- nient or useful as their Perl alternatives. 6 For example, although split looks at A<TAB><TAB>B as you do, seeing the fields A and B, the Unix cut command sees three fields there by default—including an imaginary empty one between the tabs! As you might guess, this discrepancy has caused many people to have difficulty using cut properly. As another example, the default behavior of Perl’s split is to return a list of whitespace-separated words, but obtaining that result by manipulating the Shell’s IFS variable requires advanced skills—and courage. 7 We’ll now turn to detailed consideration of each of the functions listed in table 7.2 and demonstrate how they can be effectively used in typical applications. 7.2.1 Using split split is typically used to extract a list of fields from a string, using the coding tech- niques shown in table 7.3. split’s optional first argument is a matching operator whose regex specifies the delimiter(s) to be used in extracting fields from the string. The optional second argu- ment overrides the default of $_ by specifying a different string to be split. 6 Perl has the advantage of being a modern descendant of the ancient Unix tradition, so Larry was able to address and correct many of its deficiencies while creating Perl. 7 Why courage? Because if the programmer neglects to reinstate the IFS variable’s original contents after modifying it, a mild-mannered Shell script can easily mutate into its evil twin from another dimension and wreak all kinds of havoc. Table 7.3 The split function Typical invocation formats a @fields=split; @fields=split /RE/; @fields=split /RE/, string; Example Explanation @fields=split; Splits $_ into whitespace-delimited “words,” and assigns the resulting list to @fields (as do the examples that follow). @fields=split /,/; Splits $_ using individual commas as delimiters. @fields=split /\s+/, $line; Splits $line using whitespace sequences as delimiters. @fields=split /[^\040\t_]+/, $line; Splits $line using sequences of one or more non- “space, tab, or underscore characters” as delimiters. a. Matching modifiers (e.g., i for case insensitivity) can be appended after the closing delimiter of the matching operator, and a custom regex delimiter can be specified after m (e.g., split m:/:;). 212 CHAPTER 7 BUILT-IN FUNCTIONS In the simplest case, shown in the table’s first invocation format, split can be invoked without any arguments to split $_ using whitespace delimiters. However, when input records need to be split into fields, it’s more convenient to use the n and a invocation options to automatically load fields into @F, as discussed in part 1. For this reason, split is primarily used in Minimal Perl for secondary splitting. For instance, input lines could first be split into fields using whitespace delimiters via the -wnla standard option cluster, and then one of those fields could be split fur- ther using another delimiter to extract its subfields. Here’s a demonstration of a script that uses this technique to show the time in a custom format: $ mytime # reformats date-style output The time is 7:32 PM. $ cat mytime #! /bin/sh # Sample output from date: Thu Apr 6 16:12:05 PST 2006 # Index numbers for @F: 0 1 2 3 4 5 date | perl -wnla -e '$hms=$F[3]; # copy time field into named variable ($hour, $minute)=split /:/, $hms; # no $seconds $am_pm='AM'; $hour > 12 and $am_pm='PM' and $hour=$hour-12; print "The time is $hour:$minute $am_pm."; ' mytime is implemented as a Shell script, to simplify the delivery of date’s output as input to the Perl command. 8 Perl’s automatic field splitting option is used (via –wnla) to load date’s output into the elements of @F, and then the array element 9 containing the hour:minutes:seconds field ($F[3]) is copied into the $hms vari- able (for readability). $hms is then split on the “:” delimiter, and its hour and minute fields are assigned to variables. What about the seconds? The programmer didn’t consider them to be of interest, so despite the fact that split returns a three-element list here, the third subfield’s value isn’t used in the program. Next, the script adds an AM/PM field, and prints the reworked date output in the cus- tom format. In addition to splitting-out subfields from time fields, you can use split in many other applications. For example, you could carve up IP addresses into their individual 8 An alternative technique based on command interpolation (like the Shell's command substitution) is shown in section 8.5. 9 The expression $F[3] uses array indexing (introduced in table 5.9) to access the fourth field. The named-variable approach could be used instead, with some additional typing: (undef, undef, undef, $hms)=@F; PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 213 numeric components using “ .” as the delimiter, but remember that you need to back- slash that character to make it literal: @IPa_parts=split /\./, $IPa; # 216.239.57.99 > 216, 239, 57, 99 You can also use split to extract schemes (such as http) and domains from URLs, using “ ://” as the delimiter: $URL='http://a.b.org'; ($scheme, $domain)=split m|:// |, $URL; # 'http', 'a.b.org' Notice the use of the m syntax of the matching operator to specify a non-slash delim- iter, to avoid conflicts with the slashes in the regex field. Tips on using split One common mistake with split is forgetting the proper order of the arguments: @words=split $data, /:/; # string, RE: WRONG! @words=split /:/, $data; # RE, string: Right! Another typical mistake is the incorrect specification of split’s field delimiters, usu- ally by accidentally describing a particular sequence of delimiters rather than any sequence of them. For example, this invocation of split says that each occurrence of the indicated character sequence is a single delimiter: $_='Hoboken::NJ,:Exit 14c'; @fields=split /,: /, $data; # Extracts two fields The result is that “Hoboken::NJ” and “Exit 14c” are assigned to the array. This alternative says that any sequence of one or more of the specified characters counts as a single delimiter, which results in “ NJ” being extracted as a separate field: $_='Hoboken::NJ,:Exit 14c'; @fields=split /[,:]+ /, $data; # Extracts three fields This second type of delimiter specification is more commonly used than the first kind, but of course what’s correct in a specific case depends on the format of the data being examined. Although split is a valuable tool, it’s not indispensable. That’s because its func- tionality can generally be duplicated through use of a matching operator in list con- text, which can also extract substrings from a string. But there’s an important difference—with split, you define the data delimiters in the regex, whereas with a matching operator, you define the delimited data there. How do you decide whether to use split or the matching operator when parsing fields? It’s simple— split is preferred for cases where it’s easier to describe the delim- iters than to describe the delimited data, whereas a matching operator using capturing parentheses (see table 3.8) is preferred for the cases where it’s easier to describe the data than the delimiters 214 CHAPTER 7 BUILT-IN FUNCTIONS Remember the mytime script? Did its design as a Shell script rather than a Perl script, and its use of date to deliver the current time to a Perl command, surprise you? If so, you’ll be happy to hear that Perl doesn’t really need the date command to tell it what time it is; Perl’s own localtime function, which we’ll cover next, pro- vides that service. 7.2.2 Using localtime You can use Perl’s localtime function to obtain time and date information in an OS-independent manner, using invocation formats shown in table 7.4. As indicated, localtime provides different types of output according to its context. Here is a command that’s adapted from the first example of the table. It produces a date-like time report by forcing a scalar context for localtime, which would otherwise be in the list context provided by print: $ perl -wl -e 'print scalar localtime;' Tue Feb 14 19:32:03 2006 Another way to use localtime is shown in the example in the table’s third row, which involves capturing and interpreting a set of time-related numbers. But in Table 7.4 The localtime function Typical invocation formats $time_string=localtime; $time_string=localtime timestamp; @time_component_numbers=localtime; $time_component_number=(localtime)[index]; Example Explanation $time=localtime; print $time; Or print scalar localtime; In scalar context, localtime returns the current date and time in a format similar to that of the date command (but without the timezone field). print scalar localtime ((stat filename)[9]); localtime can be used to convert a numeric timestamp, as returned by stat, into a string formatted like date’s output. The example shows the time when filename was last modified. ($sec, $min, $hour, $dayofmonth, $month, $year, $dayofweek, $dayofyear, $isdst)=localtime; In list context, localtime returns nine values representing the current time. Most of the date- related values are 0-based, so $dayofweek, for example, ranges from 0–6. But $year counts from 1900, representing the year 2000 as 100. $dayofyear=(localtime)[7] + 1; print "Day of year: $dayofyear"; As with any list-returning function, the call to localtime can be parenthesized and then subscripted as if it were an array. Because the dayofyear field is 0-based, it needs to be incremented by 1 for human consumption. PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 215 simple cases, you can parenthesize the call to localtime and index into it as if it were an array, as in the “day of year” example of the table’s last row. Here’s a rewrite of the mytime script shown earlier, which converts it to use localtime instead of date: $ cat mytime2 #! /usr/bin/perl -wl (undef, $minutes, $hour)=localtime; # we don't care about seconds $am_pm='AM'; $hour > 12 and $am_pm='PM' and $hour=$hour-12; print "The time is $hour:$minutes $am_pm."; $ mytime2 The time is 7:42 PM. This new version is both more efficient and more OS-portable than the original, which makes it twice as good! Tips on using localtime Here’s an especially productivity-enhancing tip. When you need to load localtime’s output into that set of nine variables shown in table 7.4’s third row, don’t try to type them in. Instead, run perldoc –f localtime in one window, and cut and paste the following paragraph from that screen into your program’s window: # 0 1 2 3 4 5 6 7 8 ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); Then, edit that assignment as needed by replacing some variables with undef, remov- ing localtime’s argument, etc. You’ll see examples featuring stat next, like the one shown in the second row of table 7.4. 7.2.3 Using stat One of the most frequently used Unix commands is the humble but absolutely indis- pensable ls -l. It provides access to the wealth of data stored in a file’s inode, which holds everything Unix knows about a file. 10 Perl provides access to that per-file data repository using the function called stat (for “file status”), which takes its name from a related UNIX resource. Table 7.5 sum- marizes the syntax of stat and shows some typical uses. 10 Well, almost everything; the file’s name resides in its directory. 216 CHAPTER 7 BUILT-IN FUNCTIONS stat is most commonly used for simple tasks like those shown in the table’s examples, such as determining the UID or inode number of a file. You’ll see a more interesting example next. Emulating the Shell’s –nt operator Let’s see how you can use Perl to duplicate the functionality of the Korn and Bash shells’ -nt (newer-than) operator, which is heavily used—and greatly appreciated—by Unix file-wranglers. Here’s a Shell command that tests whether the file on the left of –nt is newer than the file on its right: [[ $file1 -nt $file2 ]] && echo "$file1 was more recently modified than $file2" The Perl equivalent is easily written using stat: (stat $file1)[9] > (stat $file2)[9] and print "$file1 was more recently modified than $file2"; The numeric comparison (>) is appropriate because the values in the atime (for access), mtime (for modification), and ctime (for change) fields are just big integer numbers, ticking off elapsed seconds from a reference point in the distant past. Accordingly, the difference between two mtime values reveals the difference in their files’ modification times, to the second. Unlike the functions seen thus far, there are many ways stat can fail—for example, the existing file /a/b could be mistyped as the non-existent /a/d, or the program’s user could be denied the permissions needed on /a to run stat on its files. For this reason, it’s a good idea to call stat in a separate statement for each Table 7.5 The stat function Typical invocation formats ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks)=stat filename; $extracted_element=(stat)[index]; Example Explanation (undef, undef, undef, undef, $uid)= stat '/etc/passwd'; print "passwd is owned by UID: $uid\n"; The file’s numeric user ID is returned as the fifth element of stat’s list, so after initializing the named variables as shown, it’s available in $uid. print "File $f's inode is: ", (stat $f)[1]; The call to stat can be parenthesized and indexed as if it were an array. The example accesses the second element (labeled $ino in the format shown above), which is the file’s inode number. PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 217 file, so you can print file-specific OS error messages (from “$!”; see appendix A) if there’s a problem. Following this advice, we can upgrade the code that emulates the Shell’s –nt oper- ator to this more robust form: $mtime1=(stat $file1)[9] or die "$0: stat of $file1 failed; $!"; $mtime2 =(stat $file2)[9] or die "$0: stat of $file2 failed; $!"; $mtime1 > $mtime2 and print "$file1 was more recently modified than $file2"; The benefit of this new version is that it can issue separate, detailed messages for a failed stat on either file, like this one issued by the nt_tester script: 11 nt_tester: stat of /a/d failed; No such file or directory stat can also help in the emulation of certain Unix commands, as you’ll see next. Emulating ls with the listfile script We’ll now consider a script called listfile, which shows how stat can be used to generate simple reports on files like those produced by ls –l. First, let’s compare their results: $ ls –l rygel -rwxr-xr-x 1 yumpy users 415 2006-05-14 19:32 rygel $ listfile rygell -rwxr-xr-x 1 yumpy users 415 Sun May 14 19:32:05 2006 rygel The format of listfile’s time string doesn’t match that of ls. However, it’s an arguably more user-friendly format, and it’s much easier to generate this way, so the programmer deemed the difference an enhancement rather than a bug. Listing 7.1 shows the script, with the most significant elements highlighted. Line 6 loads the CPAN module that provides the format_mode function used on Line 17. 1 #! /usr/bin/perl -wl 2 3 # load CPAN module whose "format_mode" function converts 4 # octal-mode > "-rw-r r " format 5 6 use Stat::lsMode ; 7 11 In contrast, the original version would report that $file1 was more recently modified than $file2 even if the latter didn't exist, because the “undefined” value (see section 8.1.1) that stat would return is treated as a 0 in numeric context. Listing 7.1 The listfile script 218 CHAPTER 7 BUILT-IN FUNCTIONS 8 @ARGV == 1 or die "Usage: $0 filename\n"; 9 $filename=shift; 10 11 (undef, undef, $mode, $nlink, $uid, $gid, 12 undef, $size, undef, $mtime)=stat $filename; 13 14 $time=localtime $mtime; # convert seconds to time string 15 $uid_name=getpwuid $uid; # convert UID-number to string 16 $gid_name=getgrgid $gid; # convert GID-number to string 17 $rwx=format_mode $mode; # convert octal mode to rwx format 18 19 printf "%s %4d %3s %9s %12d %s %s\n", 20 $rwx, $nlink, $uid_name, $gid_name, $size, $time, $filename; Line 12 assigns stat’s output to a list consisting of variables and undef placeholders that ends with $mtime, the rightmost element of interest from the complete set of 13 elements. This sets up the six variables needed in Lines 14–20. On Line 14, the $mtime argument to localtime gets converted into a date- like time string (a related example is shown in row two of table 7.4.) Lines 15 and 16, respectively, convert the UID and GID numbers provided by stat into their corresponding user and group names, using special Perl built-in func- tions (see man perlfunc). The functions are called getpwuid, and getgrgid because they get the user or group name by looking up the record having the supplied numeric UID or GID in the Unix password file (“pw”) or group file (“gr”). 12 Line 17 converts the octal $mode value to an ls-style permissions string, using the imported format_mode function. The printf function is used to format all the output, because it allows a data type and field width—such as “ %9s”, which means display a string in nine columns—to be specified for each of its arguments. As mentioned earlier, the way localtime formats the time-string is different from the format produced by the Linux ls command, so some Unix users might prefer to use the real ls. On the other hand, listfile provides a good starting point for those using other OSs who wish to develop an ls-like command. 13 Tips on using stat For over three decades, untold legions of Shell programmers have—according to local custom—groused, whinged, and/or kvetched about the need to repeatedly respecify the filename in statements like these: 12 As usual, it’s no coincidence that these Perl functions have the same names as their Unix counterparts, which are C-language library functions. 13 The first enhancement might be to use the looping techniques demonstrated in chapter 10 to upgrade listfile to listfiles. PROGRAMMING WITH FUNCTIONS THAT GENERATE OR PROCESS SCALARS 219 [ -f "$file" -a -r "$file" -a -s "$file" ] || exit 42; [[ -f $file && -r $file && -s $file ]] || exit 42; To give those who’ve migrated to Perlistan some much-deserved comfort and succor, Perl supports the use of the underscore character as a shorthand reference to the last filename used with stat or a file-test operator (within a particular code block). Accordingly, the Perl counterpart to the previous Shell command—which tests that a file is regular, readable, and has a size greater than 0 bytes—can be written like so: -f $file and -r _ and -s _ or exit 42; Here’s an example of economizing on typing by using the underscore with the stat function: (stat $filename)[5] == (stat _)[7] and warn "File's GID equals its size; could this mean something?"; To get the size of a file, it’s easier to use –s $file (see table 6.2) than the equivalent stat invocation, which is (stat $file)[7]. As a final tip, when you need to load stat’s output into those 13 time variables, don’t try to type them in; run perldoc –t stat in one window, cut and paste the following paragraph from that screen into your program’s window, and edit as needed: ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($filename); Next, we’ll look at the chomp function, which is used to strip trailing newlines from input that’s read manually, rather than through the auspices of the implicit input- reading loop. 7.2.4 Using chomp In Minimal Perl, routine use of the l option, along with n or p, frees you from worrying about trailing newlines fouling-up string comparisons involving input lines. That’s because the l option provides automatic chomping—removal of trailing newlines—on the records read by the implicit loop. 14 For this reason, if you want your program to terminate on encountering a line consisting of “ DONE”, you can conveniently code the equality test like this: $_ eq 'DONE' and exit; # using option n or p, along with l That’s easier to type and less error-prone than what you’d have to write if you weren’t using the l option: $_ eq "DONE\n" and exit; # using option n or p, without l 14 See table 7.6 for a more precise definition of what chomp does. [...]... functions for list processing—which provide reordering, joining, filtering, and transforming services, respectively, for lists The table also shows each function’s nearest relative in Unix or the Shell Table 7.8 Useful Perl functions for lists, and their nearest relatives in Unix Built-in Perl function Unix relative(s) sort The Unix sort command List sorting Takes a list, and returns a sorted list reverse Linux s... 7.7 The rand function Typical invocation formats $random_tiny_number=rand; $random_larger_number=rand N; $random_element=$some_array[ rand @some_array ]; Example Explanation $num=rand; Assigns a floating-point number N, in the range 0 . (With Unix utilities and Shell built-in commands, newlines are always removed automatically.) rand The Shell’s RANDOM variable; AWK’s rand function Generating random numbers Generates random. functions and dem- onstrate, among other things, how rand can be used with grep to do random filtering. Table 7.7 The rand function Typical invocation formats $random_tiny_number=rand; $random_larger_number=rand. similarities and differences in how data flows between com- mands and functions. 7.3.1 Comparing Unix pipelines and Perl functions Although there are distinct similarities between Unix command pipelines