Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 120 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
120
Dung lượng
1,32 MB
Nội dung
Object-oriented Perl 813 TEAMFLY Team-Fly ® Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 19 814 Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Inside Perl In this chapter, we will look at how Perl actually works – the internals of the Perl interpreter. First, we will examine what happens when Perl is built, the configuration process and what we can learn about it. Next, we will go through the internal data types that Perl uses. This will help us when we are writing extensions to Perl. From there, we will get an overview of what goes on when Perl compiles and interprets a program. Finally, we will dive into the experimental world of the Perl compiler: what it is, what it does, and how we can write our own compiler tools with it. To get the most out of this chapter, it would be best advised for us to obtain a copy of the source code to Perl. Either of the two versions, stable or development, is fine and they can both be obtained from our local CPAN mirror. Analyzing the Perl Binary – 'Config.pm' If Perl has been built on our computer, the configuration stage will have asked us a number of questions about how we wanted to build it. For instance, one question would have been along the lines of building Perl with, or without threading. The configuration process will also have poked around the system, determining its capabilities. This information is stored in a file named config.sh, which the installation process encapsulates in the module Config.pm. The idea behind this is that extensions to Perl can use this information when they are being built, but it also means that we as programmers, can examine the capabilities of the current Perl and determine whether or not we could take advantage of features such as threading provided by the Perl binary executing our code. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 20 816 'perl -V' The most common use of the Config module is actually made by Perl itself: perl –V, which produces a little report on the Perl binary. It is actually implemented as the following program: #!/usr/bin/perl # config.pl use warnings; use strict; use Config qw(myconfig config_vars); print myconfig(); $"="\n "; my @env = map {"$_=\"$ENV{$_}\""} sort grep {/^PERL/} keys %ENV; print " \%ENV:\n @env\n" if @env; print " \@INC:\n @INC\n"; When this script is run we will get something resembling the following, depending on the specification of the system of course: > perl config.pl Summary of my perl5 (revision 5.0 version 7 subversion 0) configuration: Platform: osname=linux, osvers=2.2.16, archname=i686–linux uname='linux deep–dark–truthful–mirror 2.4.0–test9 #1 sat oct 7 21:23:59 bst 2000 i686 unknown ' config_args='–d –Dusedevel' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='cc', ccflags ='–fno–strict–aliasing –I/usr/local/include –D_LARGEFILE_SOURCE – D_FILE_OFFSET_BITS=64', optimize='–g', cppflags='–fno–strict–aliasing –I/usr/local/include' ccversion='', gccversion='2.95.2 20000220 (Debian GNU/Linux)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, usemymalloc=n, prototype=define Linker and Libraries: ld='cc', ldflags =' –L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=–lnsl –ldb –ldl –lm –lc –lcrypt –lutil perllibs=–lnsl –ldl –lm –lc –lcrypt –lutil libc=/lib/libc–2.1.94.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='–rdynamic' cccdlflags='–fpic', lddlflags='–shared –L/usr/local/lib' @INC: Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Inside Perl 817 lib /usr/local/lib/perl5/5.7.0/i686–linux /usr/local/lib/perl5/5.7.0 /usr/local/lib/perl5/site_perl/5.7.0/i686–linux /usr/local/lib/perl5/site_perl/5.7.0 /usr/local/lib/perl5/site_perl How It Works Most of the output is generated by the myconfig function in Config. It produces a list of the variables discovered by the Configure process when Perl was built. This is split up into four sections: Platform, Compiler, Linker and Libraries, and Dynamic Linking. Platform The first section, platform, tells us a little about the computer Perl was being built on, as well as some of the choices we made at compile time. This particular machine is running Linux 2.4.0–test9, and the arguments –d –Dusedevel were passed to Configure during the question and answer section. (We will see what these arguments do when we come to looking at how Perl is built.) hint=recommended means that the configure program accepted the recommended hints for how a Linux system behaves. We built the POSIX module, and we have a struct sigaction in our C library. Next comes a series of choices about the various flavors of Perl we can compile: usethreads is turned off, meaning this version of Perl has no threading support. Perl has two types of threading support. See Chapters 1 and 22 for information regarding the old Perl 5.005 threads, which allow us to create and destroy threads in our Perl program, inside the Perl interpreter. This enables us to share data between threads, and lock variables and subroutines against being changed or entered by other threads. This is the use5.005threads option above. The other model, which came with version 5.6.0, is called interpreter threads or ithreads. In this model, instead of having two threads sharing an interpreter, the interpreter itself is cloned, and each clone runs its own portion of the program. This means that, for instance, we can simulate fork on systems such as Windows, by cloning the interpreter and having each interpreter perform separate tasks. Interpreter threads are only really production quality on Win32 – on all other systems they are still experimental. Allowing multiple interpreters inside the same binary is called multiplicity. The next two options refer to the IO subsystem. Perl can use an alternative input/output library called sfio (http://www.research.att.com/sw/tools/sfio) instead of the usual stdio if it is available. There is also a separate PerlIO being developed, which is specific to Perl. Next, there is support for files over 2Gb if our operating system supports them, and support for the SOCKS firewall proxy, although the core does not use this yet. Finally, there is a series of 64-bit and long double options. Compiler The compiler tells us about the C environment. Looking at the output, we are informed of the compiler we used and the flags we passed to it, the version of GCC used to compile Perl and the sizes of C's types and Perl's internal types. usemymalloc refers to the choice of Perl's supplied memory allocator rather than the default C one. The next section is not very interesting, but it tells us what libraries we used to link Perl. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 20 818 Linker and Libraries The only thing of particular note in this section is useshrplib, which allows us to build Perl as a shared library. This is useful if we have a large number of embedded applications, and it means we get to impress our friends by having a 10K Perl binary. By placing the Perl interpreter code in a separate library, Perl and other programs that embed a Perl interpreter can be made a lot smaller, since they can share the code instead of each having to contain their own copy. Dynamic Linking When we use XS modules (for more information on XS see Chapter 21), Perl needs to get at the object code provided by the XS. This object code is placed into a shared library, which Perl dynamically loads at run time when the module is used. The dynamic linking section determines how this is done. There are a number of models that different operating systems have for dynamic linking, and Perl has to select the correct one here. dlsrc is the file that contains the source code to the chosen implementation. dlsymun tells us whether or not we have to add underlines to symbols dynamically loaded. This is because some systems use different naming conventions for functions loaded at run time, and Perl has to cater to each different convention. The documentation to the Config contains explanations for these and other configure variables accessible from the module. It gets this documentation from Porting/Glossary in the Perl source kit. What use is this? Well, for instance, we can tell if we have a threaded Perl or whether we have to use fork: use Config; if ($Config{usethreads} eq "define") { # we have threads. require MyApp::Threaded; } else { # make do with forking require MyApp::Fork; } Note that Config gives us a hash, %Config, which contains all the configuration variables. Under the Hood Now it is time to really get to the deep material. Let us first look around the Perl source, before taking an overall look at the structure and workings of the Perl interpreter. Around the Source Tree The Perl source is composed of around 2190 files in 186 directories. To be really familiar with the source, we need to know where we can expect a part of it to be found, so it is worth taking some time to look at the important sections of the source tree. There are also several informational files in the root of the tree: Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Inside Perl 819 ❑ Changes* – a very comprehensive list of every change that has been made to the Perl source since Perl 5.000 ❑ Todo* – lists the changes that haven't been made yet – bugs to be fixed, ideas to try out, and so on ❑ MANIFEST – tells us what each file in the source tree does ❑ AUTHORS and MAINTAIN – tell us who is 'looking after' various parts of the source ❑ Copying and Artistic – the two licenses under which we receive Perl Documentation The bulk of the Perl documentation lives in the pod/ directory. Platform- specific notes can be found as README.* in the root of the source tree. Core modules The standard library of modules shipped with Perl is distributed around two directories: pure-Perl modules that require no additional treatment are placed in lib/, and the XS modules are each given their own subdirectory in the ext/ directory. Regression tests When a change to Perl is made, the Perl developers will run a series of tests to ensure that this has not introduced any new bugs or reopened old ones; Perl will also encourage us to run the tests when we build a new Perl on our system. These regression tests are found in the t/ directory. Platform– specific code Some platforms require a certain amount of special treatment. They do not provide some system calls that Perl needs, for instance, or there is some difficulty in getting them to use the standard build process. (See Building Perl.) These platforms have their own subdirectories: apollo/, beos/, cygwin/, djgpp/, epoc/, mint/, mpeix/, os2/, plan9/, qnx/, vmesa/, vms/, vos/, and win32/. Additionally, the hints/ subdirectory contains a series of shell scripts, which communicate platform-specific information to the build process. Utilities Perl comes with a number of utilities scattered around. perldoc and the pod translators, s2p, find2perl, a2p, and so on. (There is a full list, with descriptions, in the perlutils documentation of Perl 5.7 and above.) These are usually kept in utils/ and x2p/, although the pod translators have escaped to pod/. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 20 820 Helper Files The root directory of the source tree contains several program files that are used to assist the installation of Perl, (installhtml, installman, installperl) some which help out during the build process (for instance, cflags, makedepend, and writemain) and some which are used to automate generating some of the source files. In this latter category, embed.pl is most notable, as it generates all the function prototypes for the Perl source, and creates the header files necessary for embedding Perl in other applications. It also extracts the API documentation embedded in the source code files. Eagle-eyed readers may have noticed that we have left something out of that list – the core source to Perl itself! The files *.c and *.h in the root directory of the source tree make up the Perl binary, but we can also group them according to what they do: Data Structures A reasonable amount of the Perl source is devoted to managing the various data structures Perl requires, we will examine more about these structures in 'Internal Variable Types' later on. The files that manage these structures – av.c, av.h, cv.h, gv.c, gv.h, hv.c, hv.h, op.c, op.h, sv.c, and sv.h – also contain a wide range of helper functions, which makes it considerably easier to manipulate them. See perlapi for a taste of some of the functions and what they do. Parsing The next major functional group in the Perl source code is the part turns our Perl program into a machine-readable data structure. The files that take responsibility for this are toke.c and perly.y, the lexer and the parser. PP Code Once we have told Perl that we want to print 'hello world' and the parser has converted those instructions into a data structure, something actually has to implement the functionality. If we wonder where, for instance, the print statement is, we need to look at what is called the PP code. (PP stands for push-pop, for reasons will become apparent later). The PP code is split across four source files: pp_hot.c contains 'hot' code which is used very frequently, pp_sys.c contains operating-system- specific code, such as network functions or functions which deal with the system databases (getpwent and friends), pp_ctl.c takes care of control structures such as while, eval, and so on. pp.c implements everything else. Miscellaneous Finally, the remaining source files contain various utility functions to make the rest of the coding easier: utf8.c contains functions that manipulate data encoded in UTF8; malloc.c contains a memory management system; and util.c and handy.h contain some useful definitions for such things as string manipulation, locales, error messages, environment handling, and the like. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Inside Perl 821 Building Perl Perl builds on a mind-boggling array of different platforms, and so has to undergo a very rigorous configuration process to determine the characteristics of the system it is being built on. There are two major systems for doing this kind of probing: the GNU project autoconf is used by the vast majority of free software, but Perl uses an earlier and less common system called metaconfig. 'metaconfig' Rather than 'autoconf'? Porting /pumpkin.pod explains that both systems were equally useful, but the major reasons for choosing metaconfig are that it can generate interactive configuration programs. The user can override the defaults easily: autoconf, at the time, affected the licensing of software that used it, and metaconfig builds up its configuration programs using a collection of modular units. We can add our own units, and metaconfig will make sure that they are called in the right order. The program Configure in the root of the Perl source tree is a UNIX shell script, which probes our system for various capabilities. The configuration in Windows is already done for us, and an NMAKE file can be found in the win32/ directory. On the vast majority of systems, we should be able to type ./Configure –d and then let Configure do its stuff. The –d option chooses sensible defaults instead prompting us for answers. If we're using a development version of the Perl sources, we'll have to say ./Configure –Dusedevel –d to let Configure know that we are serious about it. Configure asks if we are sure we want to use a development version, and the default answer chosen by –d is 'no'.– Dusedevel overrides this answer. We may also want to add the –DDEBUGGING flag to turn on special debugging options, if we are planning on looking seriously at how Perl works. When we start running Configure, we should see something like this: > ./Configure -d -Dusedevel Sources for perl5 found in "/home/simon/patchbay/perl". Beginning of configuration questions for perl5. Checking echo to see how to suppress newlines using –n. The star should be here––>* First make sure the kit is complete: Checking And eventually, after a few minutes, we should see this: Creating config.sh If you'd like to make any changes to the config.sh file before I begin to configure things, do it as a shell escape now (e.g. !vi config.sh). Press return or use a shell escape to edit config.sh: After pressing return, Configure creates the configuration files, and fixes the dependencies for the source files. We then type make to begin the build process. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 20 822 Perl builds itself in various stages. First, a Perl interpreter is built called miniperl; this is just like the eventual Perl interpreter, but it does not have any of the XS modules – notably, DynaLoader – built in to it. The DynaLoader module is special because it is responsible for coordinating the loading of all the other XS modules at run time; this is done through DLLs, shared libraries or the local equivalent on our platform. Since we cannot load modules dynamically without DynaLoader, it must be built in statically to Perl – if it was built as a DLL or shared library, what would load it? If there is no such dynamic loading system, all of the XS extensions much be linked statically into Perl. miniperl then generates the Config module from the configuration files generated by Configure, and processes the XS files for the extensions that we have chosen to build; when this is done, make returns to the process of building them. The XS extensions that are being linked in statically, such as DynaLoader, are linked to create the final Perl binary. Then the tools, such as the pod translators, perldoc, perlbug, perlcc, and so on, are generated, these must be created from templates to fill in the eventual path of the Perl binary when installed. The sed–to–perl and awk-to-perl translators are created, and then the manual pages are processed. Once this is done, Perl is completely built and ready to be installed; the installperl program looks after installing the binary and the library files, and installman and installhtml install the documentation. How Perl Works Perl is a byte-compiled language, and Perl is a byte-compiling interpreter. This means that Perl, unlike the shell, does not execute each line of our program as it reads it. Rather, it reads in the entire file, compiles it into an internal representation, and then executes the instructions. There are three major phases by which it does this: parsing, compiling, and interpreting. Parsing Strictly speaking, parsing is only a small part of what we are talking of here, but it is casually used to mean the process of reading and 'understanding' our program file. First, Perl must process the command-line options and open the program file. It then shuttles extensively between two routines: yylex in toke.c, and yyparse in perly.y. The job of yylex is to split up the input into meaningful parts, (tokens) and determine what 'part of speech' each represents. toke.c is a notoriously fearsome piece of code, and it can sometimes be difficult to see how Perl is pulling out and identifying tokens; the lexer, yylex, is assisted by a sublexer (in the functions S_sublex_start, S_sublex_push, and S_sublex_done), which breaks apart double-quoted string constructions, and a number of scanning functions to find, for instance, the end of a string or a number. Once this is completed, Perl has to try to work out how these 'parts of speech' form valid 'sentences'. It does this by means of grammar, telling it how various tokens can be combined into 'clauses'. This is much the same as it is in English: say we have an adjective and a noun – 'pink giraffes'. We could call that a 'noun phrase'. So, here is one rule in our grammar: adjective + noun => noun phrase Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... on a very simple program: > perl -MO=Terse -e '$a = $b + $c' LISTOP (0x8178b90) leave OP (0x8178bb8) enter COP (0x8178b 58) nextstate BINOP (0x8178b30) sassign BINOP (0x8178b 08) add [1] UNOP (0x81 789 e8) null [15] SVOP (0x80fbed0) gvsv GV (0x80fa0 98) *b UNOP (0x8178ae8) null [15] SVOP (0x8178a 08) gvsv GV (0x80f0070) *c UNOP (0x816b4b0) null [15] SVOP (0x816dd40) gvsv GV (0x80fa02c) *a -e syntax OK This... by passing the exec parameter to the compiler: > perl -MO=Terse,exec -e '$a = $b + $c' OP (0x80fcf30) enter COP (0x80fced0) nextstate SVOP (0x80fc1d0) gvsv GV (0x80fa094) *b SVOP (0x80fcda0) gvsv GV (0x80f0070) *c BINOP (0x80fce80) add [1] SVOP (0x816b 980 ) gvsv GV (0x80fa0 28) *a BINOP (0x80fcea8) sassign LISTOP (0x80fcf 08) leave -e syntax OK 83 6 Inside Perl Simpo PDF Merge and Split Unregistered Version... hard-core Perl hackers trying to understand something about the internals, but it can be quite overwhelming at first sight: > perl -MO=Debug -e '$a = $b + $c' LISTOP (0x8 183 c30) op_next 0x0 op_sibling 0x0 op_ppaddr PL_ppaddr[OP_LEAVE] op_targ 0 op_type 1 78 op_seq 6437 op_flags 13 op_private 64 op_first 0x8 183 c 58 op_last 0x81933c8 op_children 3 OP (0x8 183 c 58) op_next 0x8 183 bf8 op_sibling 0x8 183 bf8 op_ppaddr... Elt No 0 0x811474c SV = IV(0x80ffb74) at 0x811474c REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 1 Elt No 1 0x81147 58 SV = RV(0x810acbc) at 0x81147 58 REFCNT = 1 FLAGS = (ROK) RV = 0x80f69a4 SV = PVAV(0x81133b0) at 0x80f69a4 REFCNT = 1 FLAGS = () IV = 0 NV = 0 ARRAY = 0x81030b0 FILL = 1 MAX = 1 ARYLEN = 0x0 FLAGS = (REAL) Elt No 0 SV = PV(0x80f6b74) at 0x80f67b8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x80fa6d0 "two"\0... example: > perl -MDevel::Peek -e '$a = 1; Dump($a); $a.="2"; Dump($a); $a += 0.5; Dump($a)' SV = IV(0x80fac44) at 0x8104630 REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 1 SV = PVIV(0x80f06f8) at 0x8104630 REFCNT = 1 FLAGS = (POK,pPOK) IV = 1 PV = 0x80f3e 08 "12"\0 CUR = 2 LEN = 3 SV = PVNV(0x80f0d 68) at 0x8104630 REFCNT = 1 FLAGS = (NOK,pNOK) IV = 1 NV = 12.5 PV = 0x80f3e 08 "12"\0 CUR = 2 LEN = 3 82 9 Chapter... string concatenation on an integer, like this: > perl -MDevel::Peek -e '$a = 1; Dump($a); $a.="2"; Dump($a)' SV = IV(0x8132fe4) at 0x8132214 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 SV = PVIV(0x8128c30) at 0x8132204 REFCNT = 1 82 8 Inside Perl Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com FLAGS = (POK,pPOK) IV = 1 PV = 0x8133e 38 "12"\0 CUR = 2 LEN = 3 Notice how our SV starts... CUR = 3 LEN = 4 Elt No 1 SV = RV(0x810acb4) at 0x80fdd24 REFCNT = 1 FLAGS = (ROK) RV = 0x81077d8 We mentioned earlier in the chapter how Perl can hold both a string and an integer value for the same variable With Devel::Peek we can see how: > perl -MDevel::Peek -e '$a="2701"; $a*1; Dump($a)' Or on Windows: > perl -mDevel::Peek -e "$a=q(2701); $a*1; Dump($a)" 83 2 Inside Perl Simpo PDF Merge and Split Unregistered... programming, the subject of Chapter 21 where Perl and C are being bound together and we need to examine the arguments passed by Perl code to C library functions For example, this is what Devel::Peek has to say about the literal number 6: > perl -MDevel::Peek -e "Dump(6)" SV = IV(0x80ffb 48) at 0x80f69 38 REFCNT = 1 FLAGS = (IOK,READONLY,pIOK,IsUV) UV = 6 83 0 Inside Perl Simpo PDF Merge and Split Unregistered... module): > perl -MDevel::Peek -e '$a = "A Simple Scalar"; Dump($a)' SV = PV(0x813b564) at 0x8144ee4 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x81471a8 "A Simple Scalar"\0 CUR = 15 LEN = 16 What does this tell us? This SV is stored at memory location C; the location can vary on different computers The particular type of SV is a PV, which is itself a structure; that structure starts at location C... NV (floating-point) values, and the corresponding POK and NOK flags set: SV = PVNV(0x80f76 48) at 0x8109b84 REFCNT = 1 FLAGS = (NOK,POK,pNOK,pPOK) IV = 0 NV = 2701 PV = 0x81022e0 "2701"\0 CUR = 4 LEN = 5 It is interesting to see that Perl actually produced a floating-point value here and not an integer – a window into Perl' s inner processes As a final example, if we reassign $a in the process of converting . http://www.simpopdf.com Inside Perl 81 7 lib /usr/local/lib /perl5 /5.7.0/i 686 –linux /usr/local/lib /perl5 /5.7.0 /usr/local/lib /perl5 /site _perl/ 5.7.0/i 686 –linux /usr/local/lib /perl5 /site _perl/ 5.7.0 /usr/local/lib /perl5 /site _perl How. 0x811474c SV = IV(0x80ffb74) at 0x811474c REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 1 Elt No. 1 0x81147 58 SV = RV(0x810acbc) at 0x81147 58 REFCNT = 1 FLAGS = (ROK) RV = 0x80f69a4 SV = PVAV(0x81133b0). PVIV(0x80f06f8) at 0x8104630 REFCNT = 1 FLAGS = (POK,pPOK) IV = 1 PV = 0x80f3e 08 "12" CUR = 2 LEN = 3 SV = PVNV(0x80f0d 68) at 0x8104630 REFCNT = 1 FLAGS = (NOK,pNOK) IV = 1 NV = 12.5 PV = 0x80f3e08