Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 59 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
59
Dung lượng
1,03 MB
Nội dung
406 CHAPTER 12 MODULES AND THE CPAN $ CL_VERBOSE=2 check_links ukeworld.com Got 101 links for ukeworld.com 85 links are unique Checking 'http://ukeworld.com/?D=A': OKAY Checking 'http://ukeworld.com/?M=A': OKAY BAD: 'https://www.ukeworld.com/ppbutton2.gif', in 'ukeworld.com' $ CL_VERBOSE=3 check_links ukeworld.com # Output edited to fit Got 101 links for ukeworld.com 85 links are unique Checking 'http://ukeworld.com/?D=A': OKAY Type: text/html Size: N/A Server: Apache/1.3.27 Checking 'http://ukeworld.com/?M=A': OKAY Type: text/html Size: N/A Server: Apache/1.3.27 BAD: 'https://www.ukeworld.com/ppbutton2.gif', in 'ukeworld.com' Verbosity level 3 dumps out the data obtained by the head function from the web server, if any; otherwise it reports the link being tested as “ BAD”. Now look at the script, which is presented in listing 12.4. It begins by importing LWP::Simple, which exports the head function automat- ically. It then checks for the variable CL_VERBOSE in the environment; if it has a num- ber in it, that number is copied to the file-scoped variable $VERBOSE, so the requested verbosity level can conveniently be determined from anywhere within the program. On Line 25, the lwp-request command obtains the list of links found within the current page, and then if tests the True/False value of the array @links to determine whether links were found. Many pages contain multiple links to other pages, so Line 32 filters the duplicates out of @links (we’ll come back to this). 1 #! /usr/local/bin/perl -wl 2 3 use strict; 4 use LWP::Simple; 5 6 my $VERBOSE=0; # file scope 7 defined $ENV{CL_VERBOSE} and $ENV{CL_VERBOSE} =~ /^\d+$/ and 8 $VERBOSE=$ENV{CL_VERBOSE}; # if numeric value, assign 9 10 { # MAIN program 11 foreach my $url ( @ARGV ) { check_link( $url ) }; 12 } 13 14 END { 15 # If propeller was last thing written to screen, 16 # will need \n before shell's upcoming prompt to STDERR 17 print STDERR ""; 18 } Listing 12.4 The check_links script USING MODULES 407 19 20 sub check_link { 21 my ( $url, @links, $link, @h, $counter, $output ); 22 $url=shift; 23 24 # use lwp-request command, based on LWP, to get links 25 if( @links=`lwp-request -o links '$url'` ) { 26 $VERBOSE and 27 print "\nGot ", scalar @links, " links for $url"; 28 } 29 else { 30 warn "$!\n"; # show OS error message 31 } 32 @links=uniquify ( @links ); # eliminate duplicate links 33 $VERBOSE and @links and # if link count > 0, show count 34 print scalar @links, " links are unique"; 35 36 foreach $link ( @links ) { # test each link 37 $link =~ /^(A|LINK)\s+mailto:/i and next; # skip mailto 38 $link =~ s/^\w+\s+//; # strip A/IMG in "A/IMG http://" 39 $link =~ s/\s+//g; # eliminate any remaining WS in link 40 41 $VERBOSE > 1 and printf "\nChecking '$link'"; 42 if ( @h=head $link ) { # run LWP's head() on link 43 if ( $VERBOSE > 1 ) { 44 print ": OKAY"; 45 $VERBOSE > 2 and 46 printf " Type: %s\tSize: %s\tServer: %s\n", 47 $h[0], ( $h[1] or "N/A" ), $h[4]; 48 } 49 else { 50 # Show "propeller" as activity indicator; 51 # printf prints backspace, then one of - \ | / 52 # to STDERR, so stays on screen if output redirected 53 printf STDERR "\b%s", # %s prints next arg as string 54 ('-', '\\', '|', '/')[$counter++ % 4]; 55 } 56 } 57 else { # report links for which "head" request fails 58 $output = "\nBAD: '$link', in '$url'"; 59 $output =~ s|http://||g; # save space by deleting http: 60 } 61 } 62 } Then, head is called on each link in turn (Line 42). For those that yield results, a pro- peller is spun, or the word “ OKAY” is printed, or a detailed report on the link is printed—according to the verbosity level. At levels above 2, head’s output is dis- played after being formatted by printf (Lines 46–47). A logical or is used to substi- tute “ N/A” for a False value in $h[1] to provide printf’s second data argument, because with some web sites, a null string gets stored in that array element. 408 CHAPTER 12 MODULES AND THE CPAN 63 64 sub uniquify { # sort and "uniquify" the arguments 65 my %unique; 66 foreach ( @_ ) { $unique{$_}++; }; 67 return sort keys %unique; 68 } The subroutine used for unique-ification (Lines 64–68) uses the technique intro- duced in section 9.2.6 of registering the items of interest as keys in a hash, and then extracting the (necessarily unique) keys. The code is encapsulated in a subroutine to facilitate later reuse. Next, we’ll revisit an advanced module I wrote, which endows Perl with a new and improved control structure adapted from the Shell. 12.3.3 Shell::POSIX::Select—the menu_ls script Apart from its many other applications demonstrated in section 10.7, the menu- oriented user interface provided by Shell::POSIX::Select can help you com- pose Unix commands, as shown in figure 12.4. The menu_ls program presents you with a series of choices for the ls command’s behavior, translates them into their corresponding command options, and then runs the constructed command. Figure 12.4 Sample run of the menu_ls script USING MODULES 409 NOTE This implementation of select lets you return to previous menus to modify earlier selections. In the session shown, the user initially selected the “regular” listing style from the Style Menu, but had second thoughts about that choice after the File Menu had already appeared. Responding to the prompt with <^D> took her back to the previ- ous menu, where she revised her choice to “long”. Then, after choosing “all files” from the File Menu, she was shown the command and given a chance to approve it before running it. Unlike the examples of the select loop shown in section 10.7, this script (see listing 12.5) ignores the loop variable and focuses on the associated $Reply vari- able instead. As it does in the Shell’s version of select, that variable contains the user’s actual numerical response, which this program uses to index into the array of options (see Lines 17 and 22). For example, a choice for the “long” listing style gets mapped into -l by way of the common index shared by the paired arrays @formats and @fmt_opts (see Lines 6–7). The same holds true for the choice of “all files” and the option of -a, which are related through the @types and @type_opt arrays. (Because the $Reply value reflects the numerical choices of the user from the menu, a 1 needs to be sub- tracted before using it as an index into the 0-based option arrays.) In addition to $Reply, the $Heading, $Prompt, and $Eof variables are also imported on Line 2, to allow for headings and prompts to be associated with menus, and for easy detection of <^D> after a loop has been exited, respectively. 14 Here’s how that works. When the FORMAT loop is entered on Line 16, the module stores the settings of $Heading and $Prompt that are currently in effect (from Lines 14 and 15), and arranges for them to be shown when the loop is reentered, as hap- pened in the sample session when the user exited the TYPE loop via <^D>. 1 #! /usr/bin/perl -wl 2 use Shell::POSIX::Select qw($Reply $Heading $Prompt $Eof); 3 4 # Would be more natural to associate choices with options via a 5 # hash, but this approach better demonstrates $Reply variable 6 @formats = ( 'regular', 'long' ); 7 @fmt_opt = ( '', '-l' ); 8 9 @types = ( 'only non-hidden', 'all files' ); 10 @typ_opt = ( '', '-a' ); 11 12 print "\n COMMAND COMPOSER FOR: ls\n"; 13 14 $Heading="\n**** Style Menu ****"; 14 These features don’t exist in Shell versions of select, but I’ve always felt they should. Listing 12.5 The menu_ls script 410 CHAPTER 12 MODULES AND THE CPAN 15 $Prompt= 'Choose listing style:'; 16 FORMAT: select ( @formats ) { 17 $user_format=$fmt_opt[ $Reply - 1 ]; 18 19 $Heading="\n**** File Menu ****"; 20 $Prompt="Choose files to list:"; 21 TYPE: select ( @types ) { # <^D> restarts FORMAT loop 22 $user_type=$typ_opt[ $Reply - 1 ]; 23 last FORMAT; # leave loops once final choice obtained 24 } 25 $Eof and next; # handle <^D> to TYPE loop 26 } 27 $Eof and exit; # handle <^D> to FORMAT loop 28 29 # Now construct user's command 30 $command="ls $user_format $user_type"; 31 32 # Show command, for educational purposes 33 printf "Press <ENTER> to execute \"$command\" "; 34 # wait for input, then discard 35 defined <STDIN> or print "\n" and exit; 36 37 system $command ; # run the command The purpose of the Eof variable is to resolve uncertainties about why the loop vari- able is empty in the statement immediately following the loop (Line 25). The two possibilities are that the loop was never entered (e.g., due to its list being empty), or that it was exited via <^D>. Testing the $Eof variable for True detects the latter case, allowing the script to respond to the user’s <^D> by reverting to the prior Style Menu (as mentioned above). On the other hand, we don’t want to hold the user hostage, so a <^D> submitted to the FORMAT loop is treated by default as a request to exit the script (Line 27), as is a <^D> response to the following “Press <ENTER>” prompt (Lines 33–35). Finally, if Line 37 is reached, the assembled command is submitted to the OS for execution by system. Although this prototype menu_ls script handles only two of ls’ many options, it nicely demonstrates your ability to write Shell-eclipsing menu-driven programs using the Shell::POSIX::Select module (see chapter 10 for additional examples). We’ll look next at a system-administration application of a module that lets Perl programs emulate aspects of the Unix find command. USING MODULES 411 12.3.4 File::Find—the check_symlinks script A filing system is supposed to provide for reliable storage and retrieval of information. Because problems with file retrieval can have serious consequences, it’s important to monitor computer file systems and take corrective action—for example, by replacing a disk drive that’s going bad—as soon as problems start to appear. One potentially vexing problem 15 on Unix systems is that of broken symbolic links—ones that formerly pointed to stored data but no longer do. To help in iden- tifying them, the script called check_symlinks scans specified directories for sym- bolic links whose target files don’t exist, and reports them in the ls -l style of symlink target. Here’s a sample run that searches two directories on a Linux system: # check_symlinks /etc /lib # Running with root privileges REPORTING BROKEN SYMLINKS UNDER: /etc /etc/X11/xdm/xdm-pid -> /var/run/xdm.pid REPORTING BROKEN SYMLINKS UNDER: /lib /lib/modules/2.4.21/build -> /local/tmp/src/linux-2.4.21 /lib/modules/2.4.19/build -> /usr/src/linux-2.4.19 /lib/modules/2.4.19-4GB/build -> /usr/src/linux-2.4.19.SuSE FILES BROKEN/EXAMINED: 4/6,797 TIME: 0:04 HR:MN START: Sat Jan 28 20:35:48 2006 END: Sat Jan 28 20:39:18 2006 Although this run took only a few minutes, on a disk farm the script could run for days at a time, which is why it was designed to produce such detailed reports of its run times. check_symlinks uses the find function from the standard File::Find mod- ule for its directory-searching and file-finding services, to avoid re-inventing that wheel. For each file that it finds under a specified directory, find calls a user-defined subroutine with $_ set to the current file’s simple name (e.g., motd) and the module’s variable $File::Find::name set to its full name (e.g., /etc/motd). Then the sub- routine is free to process that file as needed. You can see the script in listing 12.6. As shown on Line 24, find needs to be sup- plied the address of the user’s file-handling subroutine, which is obtained by prepend- ing the special \& operator to check_slinks’s name. 16 Line 38 in that subroutine checks whether the current file is a symbolic link, and if so, it gets the name of its target 15 I think it’s partly a longing for the data that may never be seen again, but mostly a feeling of being be- trayed by a trusted ally, that bothers me so about such losses. But I suppose the betrayal angle is just wishful thinking, because most broken symlinks seem to be caused by user error (e.g., rm-ing the target file). 16 The address is needed because the user’s subroutine might not be readily accessible by name from the File::Find namespace, but it can definitely be invoked by address. 412 CHAPTER 12 MODULES AND THE CPAN (Line 40) using the built-in readlink function. If the target file doesn’t exist, the full pathname of the symlink and its target are printed to report the problem (Line 48). It’s important to recognize that check_symlinks, like all scripts using find, has to work within a special constraint. Specifically, because it’s find that calls check_slinks—rather than the user’s script itself—check_slinks can’t use return to send any information back to the script. This leaves the programmer with two options for working with the information that’s only available within check_slinks. He must either deal with it (e.g., print it out) once and for all in that subroutine, or else store it in a place where it will still be accessible (e.g., in a widely scoped variable) after find returns control to the user’s program. 1 #! /usr/bin/perl -wl 2 3 use strict; 4 use File::Find; 5 6 { # Special scope for Main + check_slinks 7 8 my $Bad=0; # file scope; used in sub check_slinks() 9 my $Total=0; # ditto 10 11 my $Usage="Usage: $0 dir1 [dir2 ]\n"; 12 @ARGV or die "$Usage"; 13 14 my $start_time=time; # for run-time calculation at end 15 16 foreach my $startdir( @ARGV ){ 17 -d $startdir and -r _ and -x _ or 18 warn "$0: Bad directory argument: $startdir\n" and 19 next; 20 21 # find broken symlinks in or under $startdir 22 23 print "REPORTING BROKEN SYMLINKS UNDER: \E$startdir"; 24 find \& check_slinks, $startdir; # call the function 25 print ""; # blank line 26 } 27 28 # Print final statistics, including program's run time 29 print "FILES BROKEN/EXAMINED: ", 30 commafy ($Bad), "/", commafy ($Total); 31 show_times ($start_time); 32 33 sub check_slinks { 34 my $isbad=0; # whether current symlink is bad 35 my $target; # where current symlink points 36 my $name=$File::Find::name; # make a shorter name Listing 12.6 The check_symlinks script USING MODULES 413 37 38 if ( -l ) { # if file (in $_) is a sym-link, 39 # find what it's pointing to 40 $target=readlink $_; 41 if (! defined $target or $target eq "") { 42 warn "$0: check_slinks(): bad readlink value", 43 " on \"$name\": $!\n"; 44 $isbad=1; 45 } 46 elsif ( ! -e $target ) { 47 # target missing; broken link, OR NFS down! 48 print "\t$name -> $target"; 49 $isbad=1; 50 } 51 } 52 # $Bad and $Total are still in scope 53 $isbad and $Bad++; # count another bad symlink 54 $Total++; # count another file examined 55 return; # goes back to "find", to be called for next file 56 } 57 58 } # end of special scope for Main + check_slinks 59 60 sub commafy { # insert commas into number strings 61 my $number=shift; 62 63 defined $number or die "$0: commafy(): no argument!"; 64 while ($number =~ s/^(-?\d+)(\d{3})/$1,$2/) { ; } 65 return $number; 66 } 67 I chose to print the details of each bad symlink from within check_slinks, because I knew the script wouldn’t need access to them later. However, I also needed to keep counts of the total number of files examined and those verified as bad, which would be needed later. I handled this by arranging for the scopes of $Bad and $Total to run from Lines 8/9 to Line 58—to include both Main and the check_slinks subroutine—and by incrementing those variables as needed within check_slinks. (Capitalizing the initial letters of those variable names helps me remember that they’re widely-scoped.) These measures allow the print statement to access those variables on Line 30. To enhance the readability of the printed ratio of bad files ( $Bad) to all files examined ( $Total), the commafy subroutine is used to insert commas at appropriate places within those numbers. The while loop on Line 64 of commafy repeats the substitution operator— which does all the work of the loop—until it finds no more three-digit sequences to commafy, which is why no statements are needed in the code block. 414 CHAPTER 12 MODULES AND THE CPAN 68 sub show_times { 69 # argument is program's start time 70 my $stime=shift or 71 die "$0: show_times(): bad argument"; 72 my $etime=time; # current (ending) time 73 my $dtime=$etime - $stime; # elapsed time 74 75 printf "\UTime:%2d:%02d HR:MN ", 76 int ( ( $dtime / 3600 ) + .5 ), 77 ( ( $dtime % 3600 ) / 60 ) + .5; 78 print "\nSTART: ", scalar localtime $stime, 79 " END: ", scalar localtime $etime; 80 } The show_times subroutine prints the program’s start and end times and its run time, which involves converting some large integers returned by the built-in time function (Line 72) into formatted date strings (Lines 78–79) and calculating the elapsed hours and minutes represented by the difference of those integers (Lines 73 and 76–77). Because the commafy and show_times subroutines are not only difficult to write but also likely to be needed again in other scripts, they’re excellent candidates for inclusion with other valued tools in a programmer’s personal utilities module (e. g., Diggitys::Utilities). Now we’ll take a foray into the equally timely topic of web programming, using Perl’s stalwart CGI module to do all the heavy lifting. 12.3.5 CGI—the survey.cgi script In the 1990s, Perl’s strengths in the areas of text parsing, pattern matching, networking, and OS independence led to it being embraced as the language of choice for web appli- cations. Perl became especially popular for Common Gateway Interface ( CGI) programming, in which a program on a server machine receives a request from a browser, handles that request, and then sends the appropriate response to the browser for display. For example, the user might request a graph of last week’s stock prices for Acme Corp. by filling out an on-screen form and then clicking the SUBMIT button. The CGI program on the web server would then 1 Retrieve the five daily stock prices from the stock-quote server; 2 Plot those prices in a pretty graph; 3 Construct a web page that contains that graph, along with a new form for the user’s next request; 4 Send the constructed page back to the browser for display. USING MODULES 415 Further details on the inner workings of the CGI protocol are beyond the scope of this book, but to give you an idea of how CGI applications can be written, we’ll look at a simple example involving a web-based survey. Figure 12.5 shows the web form that is produced by the survey.cgi script after it already has been filled out by the user. When the user clicks the SUBMIT button shown in figure 12.5, a page confirming his answers to the survey questions appears next (see figure 12.6). Figure 12.5 Initial screen of the survey.cgi script Figure 12.6 Confirmation screen from the survey.cgi script [...]... $form.=startform(-method=>'GET', -action=>''); $form.=hr() "\n"; $form.=b('Please enter your first initial and surname:'); $form.=p(); $form.='Initial'; $form.=textfield(-name=>'init', -size=>1, -maxlength=>1); $form.="\n"; $form.='Surname'; $form.=textfield(-name=>'name', -size=>12, -maxlength=>30); $form.="\n" hr() "\n"; $form.=b('Please indicate your age group:'); $form.=p(); $form.=radio_group(-name=>'age',... v3.1.4 for gawk; v4.2.3 for find; and v3.00.16(1) for bash See also UNIX, Unix, GNU, POSIX clobberation This made-up noun describes what happens to a file or variable when its former contents are destroyed by a programmer accidentally writing other data over it See also masking command We distinguish between two kinds of commands A Perl command is formed by interactively typing perl and any desired arguments... shorthand name used in Perl circles for the book entitled Programming Perl, which serves as the printed reference manual for the Perl language classic UNIX utility Sometimes we need to differentiate between the versions of UNIX utilities that have historically been found on UNIX systems, and their more modern POSIX-compliant counterparts that are provided on modern Unix systems We identify the former... web pages for broken (hyper-)links (check_links, section 12.3.2), to help users construct appropriate invocations of Unix commands (menu_ls, section 12.3.3), and to check Unix file systems for broken symbolic links (check_symlinks, section 12.3.4) 19 424 For additional information on creating custom modules, see Writing Perl Modules for CPAN, by Sam Tregar (Apress, 2002) CHAPTER 1 2 MODULES AND THE CPAN... interactively typing perl and any desired arguments to the Shell A Unix command is similar, but uses a program other than perl (echo, grep, ls, etc.): $ perl –wl –e 'print "Hello world!";' $ grep 'Waldo' hiding_places # Perl command # Unix command See also script, argument 1 I taught for Western Electric’s UNIX University” at the time and made many marginal notes in my well-worn personal copy of that... matches are searched for simultaneously, as 436 in \bERROR \b|\bWARNING\b, each of the alternative parts of the regex qualifies as a separate pattern See also regex Perl, perl The word Perl refers to the language itself, whereas perl refers to the special interpreter program that’s needed to run programs written in the Perl language Perlistan, Perlistani Perlistan is the fabled land of the Perl- speaking JAPHs,... formfeed, space, and tab characters YAPC This acronym stands for “Yet Another Perl Conference,” which is the name for a collection of low-priced grassroots events held around the world for the benefit of those who either can’t afford the expense of the more elaborately staged conferences or just prefer the company of students and geeks to corporate IT types See http://yapc.pm.org for additional information... variable $form; gets printed on screen, if # run normally, or gets sent to browser, in CGI mode if ( ! param() ) { # if no parameters, we're displaying form my $form=header(); # prepare HTTP header for server $form.=start_html(-title=>'CGI Module Demo', -BGCOLOR=>'ivory'); $form.=h1('National Junk Food Survey'); USING MODULES # action='' means this script will handle the submitted form $form.=startform(-method=>'GET',... contrast, the Shell counterpart to a function (e.g., echo) is called a command 434 See operator for an explanation of the purposeful lack of differentiation between the terms function and operator in this book See also subroutine, operator, command GNU GNU (see http://www.gnu.org) stands for Gnu’s Not UNIX (really!) and is the brand name for the software produced by the Free Software Foundation, headed by... AT&T ’s UNIX System V Release 0, as documented in the UNIX System User’s Manual, Release 5.0 published by Western Electric in June, 1982.1 Our reference point for POSIX utilities is the set provided in the directory /usr/xpg4/bin of Solaris 10. 2 The GNU utilities we refer to are those distributed with SuSE Linux version 10, which are v2.5.1 for grep, egrep, and fgrep; v4.1.4 for sed; v3.1.4 for gawk; . handle the submitted form 20 $form.=startform(-method=>'GET', -action=>''); 21 $form.=hr() . "
"; 22 23 $form.=b('Please enter your first initial and. restarts FORMAT loop 22 $user_type=$typ_opt[ $Reply - 1 ]; 23 last FORMAT; # leave loops once final choice obtained 24 } 25 $Eof and next; # handle <^D> to TYPE loop 26 } 27 $Eof and exit; # handle. and exit; # handle <^D> to FORMAT loop 28 29 # Now construct user's command 30 $command="ls $user_format $user_type"; 31 32 # Show command, for educational purposes 33 printf