professional perl programming wrox 2001 phần 7 pptx

Text Processing and Document Generation 693 If all is well then it will return: poddyscript.pl pod syntax OK. Otherwise it will produce a list of problems, which we can then go and fix, for example: *** WARNING: file does not start with =head at line N in file poddyscript.pl This warning indicates that we have started pod documentation with something other than a =head1 or =head2, which the checker considers to be suspect. Likewise: *** WARNING: No numeric argument for =over at line N in file poddyscript.pl *** WARNING: No items in =over (at line 17) / =back list at line N in file poddyscript.pl This indicates that we have an =over =back pair, which not only does not have a number after the over, but does not even contain any items. The first is probably an omission. The second indicates that we might have bunched up our items so they all run into the =over token, like this: =over =item item one =item item two =back If we had left out the space before the =back we would instead have got: *** ERROR: =over on line N without closing =back at line EOF in file poddyscript.pl In short, podchecker is a useful tool and we should use it if we plan to write pod of any size in our Perl scripts. The module that implements podchecker is called Pod::Checker, and we can use it with either filenames or file handles supplied for the first two arguments: # function syntax $ok = podchecker($podfile, $checklog, %options); # object syntax $checker = new Pod::Checker %options; $checker->parse_from_file($podpath, $checklog); Both file arguments can be either filenames or filehandles. By default, the pod file defaults to STDIN and the check log to STDERR, so a very simple checker script could be: use Pod::Checker; print podchecker?"OK":"Fail"; The options hash, if supplied, allows one option to be defined: enable or disable the printing of warnings. The default is on, so we can get a verification check without a report using STDIN and STDERR: $ok = podchecker(\*STDIN, \*STDERR,'warnings' => 0); The actual podchecker script is more advanced than this, of course, but not by all that much. TEAMFLY Team-Fly ® Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 18 694 Programming pod Perl provides a number of modules for processing pod documentation – we mentioned Pod::Checker just a moment ago. These modules form the basis for all the pod utilities, some of which are not much more than simple command-line wrappers for the associated module. Most of the time we do not need to process pod programmatically, but in case we do, here is a list of the pod modules supplied by Perl and what each of them does: Module Action Pod::Checker The basis of the podchecker utility. See above. Pod::Find Search for and return a hash of pod documents. See 'Locating pods' below. Pod::Functions A categorized summary of Perl's functions, exported as a hash. Pod::Html The basis for the pod2html utility. Pod::Man The basis for both the pod2man and the functionally identical pod2roff utilities. Pod::Parser The pod parser. This is the basis for all the translation modules and most of the others too. New parsers can be implemented by inheriting from this module. Pod::ParseUtils A module containing utility subroutines for retrieving information about and organizing the structure of a parsed pod document, as created by Pod::InputObjects. Pod::InputObjects The implementation of the pod syntax, describing the nature of paragraphs and so on. In-memory pod documents can be created on the fly using the methods in this module. Pod::Plainer A compatibility module for converting new style pod into old style pod. Pod::Select A subclass of Pod::Parser and the basis of the podselect utility, Pod::Select extracts selected parts of pod documents by searching for their heading titles. Any translator that inherits from Pod::Select rather than Pod::Parser will be able to support the Pod::Usage module automatically. Pod::Text The basis of the pod2text utility. Pod::Text::Color Convert pod to text using ANSI color sequences. The basis of the -color option to pod2text. Subclassed from Pod::Text. This uses Term::ANSIColor, which must be installed; see Chapter 15. Pod::Text::Termcap Convert pod to text using escape sequences suitable for the current terminal. Subclassed from Pod::Text. Requires termcap support, see Chapter 15. Pod::Usage The basis of the pod2usage utility; this uses Pod::Select to extract usage-specific information from pod documentation by searching for specific sections, for example, NAME, SYNOPSIS. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Text Processing and Document Generation 695 Using Pod Parsers Translator modules, which is to say any module based directly or indirectly on Pod::Parser, may be used programmatically by creating a parser object and then calling one of the parsing methods: parse_from_filehandle($fh, %options); Or: parse_from_file($infile, $outfile, %options); For example, assuming we have Term::ANSIColor installed, we can create ANSIColor text documents using this short script: #!/usr/bin/perl # parseansi.pl use warnings; use strict; use Pod::Text::Color; my $parser = new Pod::Text::Color( width => 56, loose => 1, sentence => 1, ); if (@ARGV) { $parser->parse_from_file($_, '-') foreach @ARGV; } else { $parser->parse_from_filehandle(\*STDIN); } We can generate HTML pages, plain text documents, and manual pages using exactly the same process from their respective modules. Writing a pod Parser Writing our own pod parser is surprisingly simple. Most of the hard work is done for us by Pod::Parser, so all we have to do is override the methods we need to replace in order to generate the kind of document we are interested in. Particularly, there are four methods we may want to override: command – Render and output POD commands. verbatim – Render and output verbatim paragraphs. textblock – Render and output regular (non-verbatim) paragraphs. interior_sequence – Return rendered interior sequence. By overriding these and other methods we can customize the document that the parser produces. Note that the first three methods display their result, whereas interior_sequence returns it. Here is a short example of a pod parser that turns pod documentation into an XML document (albeit without a DTD): Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 18 696 #!/usr/bin/perl # parser.pl use warnings; use strict; { package My::Pod::Parser; use Pod::Parser; our @ISA = qw(Pod::Parser); sub command { my ($parser, $cmd, $para, $line) = @_; my $fh = $parser->output_handle; $para =~s/[\n]+$//; my $output = $parser->interpolate($para, $line); print $fh "<pod:$cmd> $output </pod:$cmd> \n"; } sub verbatim { my ($parser, $para, $line) = @_; my $fh = $parser->output_handle; $para =~s/[\n]+$//; print $fh "<pod:verbatim> \n $para \n </pod:verbatim> \n"; } sub textblock { my ($parser, $para, $line) = @_; my $fh = $parser->output_handle; print $fh $parser->interpolate($para, $line); } sub interior_sequence { my ($parser, $cmd, $arg) = @_; my $fh = $parser->output_handle; return "<pod:int cmd=\"$cmd\"> $arg </pod:int>"; } } my $parser = new My::Pod::Parser(); if (@ARGV) { $parser->parse_from_file($_) foreach @ARGV; } else { $parser->parse_from_filehandle(\*STDIN); } To implement this script we need the output filehandle (since the parser may be called with a second argument), which we can get from the output_handle method. We also take advantage of Pod::Parser to do the actual rendering work by using the interpolate method, which in turn calls our interior_sequence method. Pod::Parser provides plenty of other methods too, some of which we can override as well as or instead of the ones we used in this parser, see: > perldoc Pod::Parser Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Text Processing and Document Generation 697 for a complete list of them. The Pod::Parser documentation also covers more methods that we might want to override, such as begin_input, end_input, preprocess_paragraph, and so on. Each of these gives us the ability to customize the parser in increasingly finer-grained ways. We have placed the Parser package inside the script in this instance, though we could equally have had it in a separate module file. To see the script in action we can feed it with any piece of Perl documentation – the pod documentation itself, for example. On a typical UNIX installation of Perl 5.6, we can do that with: > perl mypodparser /usr/lib/perl5/5.6.0/pod/perlpod.pod This generates an XML version of perlpod that starts like this: <pod:head1>NAME</pod:head1> perlpod - plain old documentation <pod:head1>DESCRIPTION</pod:head1> A pod-to-whatever translator reads a pod file paragraph by paragraph, and translates it to the appropriate output format. There are three kinds of paragraphs: <pod:int cmd="L">verbatim|/"Verbatim Paragraph"</pod:int>, <pod:int cmd="L">command|/"Command Paragraph"</pod:int>, and <pod:int cmd="L">ordinary text|/"Ordinary Block of Text"</pod:int>. <pod:head2>Verbatim Paragraph</pod:head2> A verbatim paragraph, distinguished by being indented (that is, it starts with space or tab). It should be reproduced exactly, with tabs assumed to be on 8-column boundaries. There are no special formatting escapes, so you can't italicize or anything like that. A \ means \, and nothing else. <pod:head2>Command Paragraph</pod:head2> All command paragraphs start with "=", followed by an identifier, followed by arbitrary text that the command can use however it pleases. Currently recognized commands are <pod:verbatim> =head1 heading =head2 heading =item text =over N =back =cut =pod =for X =begin X =end X </pod:verbatim> By comparing this with the original document we can see how the parser is converting pod tokens into XML tags. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 18 698 Locating pods The UNIX-specific Pod::Find module searches for pod documents within a list of supplied files and directories. It provides one subroutine of importance, pod_find, which is not imported by default. This subroutine takes one main argument – a reference to a hash of options including default search locations. Subsequent arguments are additional files and directories to look in. The following script implements a more or less fully-featured pod search based around Pod::Find and Getopt::Long, which we cover in detail in Chapter 14. #!/usr/bin/perl # findpod.pl use warnings; use strict; use Pod::Find qw(pod_find); use Getopt::Long; # default options my $verbose = undef; my $include = undef; my $scripts = undef; my $display = 1; # allow files/directories and options to mix Getopt::Long::Configure('permute'); # get options GetOptions('verbose!' => \$verbose, 'include!' => \$include, 'scripts!' => \$scripts, 'display!' => \$display, ); # if no directories specified, default to @INC $include=1if!defined($include) and (@ARGV or $scripts); # perform scan my %pods = pod_find({ -verbose => $verbose, -inc => $include, -script => $scripts, -perl => 1 }, @ARGV); # display results if required if ($display) { if (%pods) { foreach(sort keys %pods) { print "Found '$pods{$_}' in $_\n"; } } else { print "No pods found\n"; } } Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Text Processing and Document Generation 699 We can invoke this script with no arguments to search @INC, or pass it a list of directories and files to search. It also supports four arguments to enable verbose messages, disable the final report, and enable Pod::Find's two default search locations. Here is one way we can use it, assuming we call it findpod: > perl findpod.pl -iv /my/perl/lib 2> dup.log This command tells the script to search @INC in addition to /my/perl/lib (-i), produce extra messages during the scan (-v), and to redirect error output to dup.log. This will capture details of any duplicate modules that the module finds during its scan. If we only want to see duplicate modules, we can disable the output and view the error output on screen with: > perl findpod.pl -i nodisplay /my/perl/lib The options passed in the hash reference to pod_find are all Boolean and all default to 0 (off). They have the following meanings: Option Action -verbose Print out progress during scan, reporting all files scanned that did not contain pod information. -inc Scan all the paths contained in @INC. -script Search the installation directory and subdirectories for pod files. If Perl was installed as /usr/bin/perl then this will be /usr/bin for example. -perl Apply Perl naming conventions for finding likely pod files. This strips likely Perl file extensions (.pod, .pm, etc.), skips over numeric directory names that are not the current Perl release, and so on. Both -inc and -script imply -perl. The hash generated by findpod.pl contains the file in which each pod document was found as the key, and the document title (usually the module package name) as the value. This is the reverse arrangement to the contents of the %INC hash, but contains the same kinds of keys and values. Reports – The 'r' in Perl Reports are a potentially useful but often overlooked feature of Perl that date back to the earliest versions of the language. In short, they provide a way to generate structured text such as tables or forms using a special layout description called a format. Superficially similar in intent to the print and sprintf functions, formats provide a different way to lay out text on a page or screen, with an entirely different syntax geared specifically towards this particular goal. The particular strength of formats comes from the fact that we can describe layouts in physical terms, making it much easier to see how the resulting text will look and making it possible to design page layouts visually rather than resorting to character counting with printf. Formats and the Format Datatype Intriguingly, formats are an entirely separate data type, unique from scalars, arrays, hashes, typeglobs, and filehandles. Like filehandles, they have no prefix or other syntax to express themselves and as a consequence often look like filehandles, which can occasionally be confusing. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 18 700 Formats are essentially the compiled form of a format definition, a series of formatting or picture lines containing literal text and placeholders, interspersed with data lines that describe the information used to fill placeholder and comment lines. As a simple example, here is a format definition that defines a single pattern line consisting mainly of literal text and a single placeholder, followed by a data line that fills that placeholder with some more literal text: # this is the picture line This is a @<<<<< justified field # this is the data line "left" To turn a format definition into a format we need to use the format function, which takes a format name and a multi-line format definition, strongly reminiscent of a here document, and turns it into a compiled format. A single full stop on its own defines the end of the format. To define the very simple format example above we would write something like this: format MYFORMAT = This is a @<<<<< justified field "left" . Note that the trailing period is very important, as it is the end token that defines the end of the implicit here document. A format definition will happily consume the entire contents of a source file if left unchecked. To use a format we use the write function on the filehandle with the same name as the format. For the MYFORMAT example above we would write: # print format definition to filehandle 'MYFORMAT' write MYFORMAT; This requires that we actually have an open filehandle called MYFORMAT and want to use the format to print to it. More commonly we want to print to standard output, which we can do by either defining a format called STDOUT, or assigning a format name to the special variable $~ ($FORMAT_NAME with the English module). In this case we can omit the filehandle and write will use the currently selected output filehandle, just like print: $~ = 'MYFORMAT'; write; We can also use methods from the IO:: family of modules, if we are using them. Given an IO::Handle-derived filehandle called $fh, we can assign and use a format on it like this: $fh->format(MYFORMAT); $fh->format_write(); We'll return to the subject of assigning formats a little later on. The write function (or its IO::Handle counterpart format_write) generates filled-out formats by combining the picture lines with the current values of the items in the data lines to fill in any placeholder present, in a process reminiscent of, but entirely unconnected to, interpolation. Once it has finished filling out, it sends the results to standard output. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Text Processing and Document Generation 701 If we do not want to print output we can instead make use of the formline function. This takes a single picture line and generates output from it into the special variable $^A. It is the internal function that format uses to generate its output, and we will see a little more of how to use it later. There is, strangely, no string equivalent of write in the same way that printf has sprintf, but it is possible to create one using formline. Format picture lines are usually written as static pieces of text, which makes them impossible to adjust to cater for different circumstances like calculated field widths. As an alternative, we can build the format inside a string and then eval it to create the format, which allows us to interpolate variables into the format at the time it is evaluated. Here is an example that creates and uses a dynamically calculated format associated with the STDOUT filehandle: #!/usr/bin/perl # evalformat.pl use warnings; use strict; # list of values for field my @values=qw(first second third fourth fifth sixth penultimate ultimate); # determine maximum width of field my $width=0; foreach (@values) { my $newwidth=length $_; $width=$newwidth if $newwidth>$width; } # create a format string with calculated width using '$_' my $definition = "This is the \@".('<'x($width-1))." line\n". '$_'."\n.\n"; # define the format through interpolation eval "format STDOUT = \n$definition"; # print out the field values using the defined format write foreach @values; The advantage of this approach is it allows us to be more flexible, as well as calculate the size of fields on the fly. The disadvantage is that we must take care to interpolate the \n newlines, but not placeholders and especially not variables in the data lines, which can lead to a confusing combination of interpolated and non-interpolated strings. This can make formats very hard to read if we are not very careful. Formats and Filehandles Formats are intimately connected with filehandles, and not just because they often look like them. Formats work by being directly associated with filehandles, so that when we come to use them all we have to do is write to the filehandle and have the associated format automatically triggered. It might seem strange that we associate a format with a filehandle and then write to the filehandle, rather than specifying which format we want to use when we do the writing, but there is a certain logic behind this mechanism. There are in fact two formats that may be associated with a filehandle; the main one is the one that is used when we write, but we can also have a top-of-page format that is used whenever Perl runs out of room on the current page and is forced to start a new one. Since this is associated with the filehandle, Perl can use it automatically when we use write rather than needing to be told. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 18 702 Defining the Top-of-Page Format Perl allows two formats to be associated with a filehandle. The main format is used whenever we issue a write statement. The top-of-page format, if defined, is issued at the start of the first page and at the top of each new page. This is determined by the special variable $= (length of page) and $- (the number of lines left). Each time we use write the value of $- increases. When there is no longer sufficient room to fit the results of the next write, a new page is started, a new top-of-page format is written and only then is the result of the last write issued. The main format is automatically associated with the filehandle of the same name, so that the format MYFORMAT is automatically used when we use write on the filehandle MYFORMAT. Giving it the name of the filehandle with the text _TOP appended to it can similarly associate the top-of-page format. For instance, to assign a main and top-of-page format to the filehandle MYFORMAT we would use something like this: format MYFORMAT = main format definition . # define a format that gives the current page number format MYFORMAT_TOP = This is page @<<< $= . Assigning Formats to Standard Output Since standard output is the filehandle most usually associated with formats, we can omit the format name when defining formats. Here is a pair of formats defined explicitly for standard output: format STDOUT= The magic word is "@<<<<<<<<"; $word . format STDOUT_TOP= Page @> $# . We can however omit STDOUT for the main format and simply write: format = The magic word is "@<<<<<<<<"; $word . This works because standard output is the default output filehandle. If we change the filehandle with select then format creates a format with the same name as that filehandle instead. The write function also allows us to omit the filehandle; to write out the formats assigned to standard output (or whatever filehandle is currently selected) we can simply put: write; Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... object-oriented programming However, during the course of this chapter we hope to show that by learning the basics of how Perl implements objects, a programmer can wield Perl in a highly effective way to implement object-oriented programs Chapter 19 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com In this chapter we introduce object-oriented programming from the Perl perspective... and destroyed, and so on Perl does not have any particular perspective, which make it both extremely flexible and highly disconcerting to programmers used to a different object-oriented style Because Perl does not dictate how object-oriented programming should be done, it can leave programmers who expect a more rigorous framework confused and aimless, which is one reason why Perl sometimes has a bad... that several picture lines can appear one after the other, as this static top-of-page format illustrates: STATIC_TOP = This header was generated courtesy of Perl formatting See Chapter 18 of Professional Perl for details 70 4 Text Processing and Document Generation Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Placeholders are defined by either an @ or... on the continuation to truncate the text Indeed, if we try to use it with a normal @ placeholder Perl will return a syntax error since this would effectively be an infinite loop that repeats the first line Since write cannot generate infinite quantities of text, Perl prevents us from trying Page Control Perl' s reporting system uses several special variables to keep track of line and page numbering We... through them come from In Perl, an object class is just a package, and an object instance is just a reference that knows its class and points to the data that defines the state of that particular instance Perl was not originally an object-oriented language; only from version 5 did it acquire the necessary features (symbolic references and packages) to implement objects As a result, Perl' s object-oriented... these advantages more easily In order to appreciate how Perl implements and provides for object-oriented programming, therefore, a basic grasp of object-oriented concepts is necessary Object Concepts Since this is not a treatise on object orientation, we will not dwell on the fundamentals of object orientation in detail Indeed, one of the advantages of Perl' s approach is that we do not need to pay nearly... nearly so much attention to them as we often do in other languages; Perl' s hands-on approach means that we can strip away a lot of the jargon that object orientation often brings with it However, several concepts are key to any kind of object-oriented programming, so here is a short discussion of the most important ones, along with Perl' s perspective on them: Classes An object class provides the implementation... are destroyed Perl implements object classes with packages In fact, a package is an object class by another name This basic equivalence is the basis for much of Perl' s simple and obvious approach to objects in general A class method is just a subroutine that takes a package name as its first argument, and an object method is a subroutine that takes an object name as its first argument Perl automatically... affect the object's state Objects may contain, within themselves, different individual values called object attributes (or occasionally instance attributes) 71 8 Object-oriented Perl Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Perl implements objects through references; the object's state is held by whatever it is that the reference points to, which is up to the object's class... makes up for it by being blindingly simple to understand Inheritance in Perl can also be dynamic, since @ISA has all the properties of a regular Perl array variable, so it can be modified during the course of a program's execution to add new parent classes, remove existing ones, entirely replace the parent class(es), or reorder them 71 9 Chapter 19 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com . piece of Perl documentation – the pod documentation itself, for example. On a typical UNIX installation of Perl 5.6, we can do that with: > perl mypodparser /usr/lib /perl5 /5.6.0/pod/perlpod.pod This. for pod files. If Perl was installed as /usr/bin /perl then this will be /usr/bin for example. -perl Apply Perl naming conventions for finding likely pod files. This strips likely Perl file extensions. courtesy of Perl formatting See Chapter 18 of Professional Perl for details . Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Text Processing and Document Generation 70 5 Placeholders

Định dạng
Số trang	120
Dung lượng	1,17 MB