Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents | Index With a worldwide community of users and more than a million dedicated programmers, Perl has proven to be the most effective language for the latest trends in computing and business. Every programmer must keep up with the latest tools and techniques. This updated version of Advanced Perl Programming from O'Reilly gives you the essential knowledge of the modern Perl programmer. Whatever your current level of Perl expertise, this book will help you push your skills to the next level and become a more accomplished programmer. O'Reilly's most high-level Perl tutorial to date, Advanced Perl Programming, Second Edition teaches you all the complex techniques for production-ready Perl programs. This completely updated guide clearly explains concepts such as introspection, overriding built-ins, extending Perl's object-oriented model, and testing your code for greater stability. Other topics include: Complex data structures Parsing Templating toolkits Working with natural language data Unicode Interaction with C and other languages In addition, this guide demystifies once complex topics like object-relational mapping and event-based development-arming you with everything you need to completely upgrade your skills. Praise for the Second Edition: "Sometimes the biggest hurdle to problem solving isn't the subject itself but rather the sheer number of modules Perl provides. Advanced Perl Programming walks you through Perl's TMTO WTDI ("There's More Than O ne Way To Do It") forest, explaining and comparing the best modules for each task so you can intelligently apply them in a variety of situations." Rocco Caputo, lead developer of POE "It has been said that sufficiently advanced Perl code is indistinguishable from magic. This book of spells goes a long way to unlocking those secrets. It has the power to transform the most humble programmer into a Perl wizard." Andy Wardley "The information here isn't theoretical. It presents tools and techniques for solving real problems cleanly and elegantly." C urtis 'Ovid' Poe " Advanced Perl Programming collects hard-earned knowledge from some of the best programmers in the Perl community, and explains it in a way that even novices can apply immediately." chromatic, Editor of P erl.com 1 / 216 Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents | Index Copyright Preface Audience Contents Conventions Used in This Book Using Code Examples We'd Like to Hear from You Safari® Enabled Acknowledgments Chapter 1. Advanced Techniques Section 1.1. Introspection Section 1.2. Messing with the Class Model Section 1.3. Unexpected Code Section 1.4. Conclusion Chapter 2. Parsing Techniques Section 2.1. Parse::RecDescent Grammars Section 2.2. Parse::Yapp Section 2.3. Other Parsing Techniques Section 2.4. Conclusion Chapter 3. Templating Tools Section 3.1. Formats and Text::Autoformat Section 3.2. Text::Template Section 3.3. HTML::Template Section 3.4. HTML::Mason Section 3.5. Template Toolkit Section 3.6. AxKit Section 3.7. Conclusion Chapter 4. Objects, Databases, and Applications Section 4.1. Beyond Flat Files Section 4.2. Object Serialization Section 4.3. Object Databases Section 4.4. Database Abstraction Section 4.5. Practical Uses in Web Applications Section 4.6. Conclusion Chapter 5. Natural Language Tools Section 5.1. Perl and Natural Languages Section 5.2. Handling English Text Section 5.3. Modules for Parsing English Section 5.4. Categorization and Extraction Section 5.5. Conclusion Chapter 6. Perl and Unicode Section 6.1. Terminology Section 6.2. What Is Unicode? Section 6.3. Unicode Transformation Formats Section 6.4. Handling UTF-8 Data Section 6.5. Encode Section 6.6. Unicode for XS Authors Section 6.7. Conclusion Chapter 7. POE Section 7.1. Programming in an Event-Driven Environment Section 7.2. Top-Level Pieces: Components Section 7.3. Conclusion Chapter 8. Testing Section 8.1. Test::Simple Section 8.2. Test::More Section 8.3. Test::Harness Section 8.4. Test::Builder Section 8.5. Test::Builder::Tester 2 / 216 Section 8.6. Keeping Tests and Code Together Section 8.7. Unit Tests Section 8.8. Conclusion Chapter 9. Inline Extensions Section 9.1. Simple Inline::C Section 9.2. More Complex Tasks with Inline::C Section 9.3. Inline:: Everything Else Section 9.4. Conclusion Chapter 10. Fun with Perl Section 10.1. Obfuscation Section 10.2. Just Another Perl Hacker Section 10.3. Perl Golf Section 10.4. Perl Poetry Section 10.5. Acme::* Section 10.6. Conclusion Colophon About the Author Colophon Index 3 / 216 Advanced Perl Programming, Second Edition by Simon Cozens Copyright © 2005, 1997 O'Reilly Media,Inc. All rights reserved. Printed in the United States of A merica. Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly books may be purchased for educational, business, or sales promotional use. O nline editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Allison Randal Production Editor: Darren Kelly Cover Designer: Edie Freedman Interior Designer: David Futato Production Services: nSight,Inc. Printing History: August 1997: First Edition. June 2005: Second Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. Advanced Perl Programming, the image of a of a black leopard, and related trade dress are trademarks of O'Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O 'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 0-596-00456-7 [M] 4 / 216 Preface It was all Nathan Torkington's fault. Our A ntipodean programmer, editor, and O'Reilly conference supremo friend asked me to update the original Advanced Perl Programming way back in 2002. The Perl world had changed drastically in the five years since the publication of the first edition, and it continues to change. Particularly, we've seen a shift away from techniques and toward resourcesfrom doing things yourself with Perl to using what other people have done with P erl. In essence, advanced Perl programming has become more a matter of knowing where to find what you need on the C PAN, [*] rather than a matter of knowing what to do. [* ] The Comprehensive Perl Archive Network (http://www.cpan.org) is the primary resource for user-contributed Perl code. Perl changed in other ways, too: the announcement of Perl 6 in 2000 ironically caused a renewed interest in Perl 5, with people stretching Perl in new and interesting directions to implement some of the ideas and blue-skies thinking about Perl 6. Contrary to what we all thought back then, far from killing off Perl 5, Perl 6's development has made it stronger and ensured it will be around longer. So it was in this context that it made sense to update Advanced Perl Programming to reflect the changes in Perl and in the CPAN. We also wanted the new edition to be more in the spirit of Perlto focus on how to achieve practical tasks with a minimum of fuss. This is why we put together chapters on parsing techniques, on dealing with natural language documents, on testing your code, and so on. But this book is just a beginning; however tempting it was to try to get down everything I ever wanted to say about Perl, it just wasn't possible. First, because Perl usage covers such a wide spreadon the CPAN, there are ready-made modules for folding DNA sequences, paying bills online, checking the weather, and playing poker. And more are being added every day, faster than any author can keep up. Second, as we've mentioned, because Perl is changing. I don't know what the next big advance in Perl will be; I can only take you through some of the more important techniques and resources available at the moment. Hopefully, though, at the end of this book you'll have a good idea of how to use what's available, how you can save yourself time and effort by using Perl and the Perl resources available to get your job done, and how you can be ready to use and integrate whatever developments come down the line. In the words of Larry Wall, may you do good magic with Perl! 5 / 216 Audience If you've read Learning Perl and Programming Perl and wonder where to go from there, this book is for you. It'll help you climb to the next level of Perl wisdom. If you've been programming in Perl for years, you'll still find numerous practical tools and techniques to help you solve your everyday problems. 6 / 216 Contents Chapter 1, Advanced Techniques, introduces a few common tricks advanced Perl programmers use with examples from popular Perl modules. Chapter 2, Parsing Techniques, covers parsing irregular or unstructured data with Par se: :R ecD esc ent and P ars e: :Ya pp, plus parsing HTML and XML. Chapter 3, Templating Tools, details some of the most common tools for templating and when to use them, including formats, Te xt: :T e mp lat e, HT ML: :T emp lat e, HT ML: :M a so n, and the Template Toolkit. Chapter 4, Objects, Databases, and Applications, explains various ways to efficiently store and retrieve complex data using objectsa concept commonly called object-relational mapping. Chapter 5, Natural Language Tools, shows some of the ways Perl can manipulate natural language data: inflections, conversions, parsing, extraction, and Bayesian analysis. Chapter 6, Perl and Unicode, reviews some of the problems and solutions to make the most of Perl's Unicode support. Chap ter 7, PO E, looks at the popular P erl event-based environment for task scheduling, multitasking, and non-blocking I/O code. Chapter 8, Testing, covers the essentials of testing your code. Chapter 9, Inline Extensions, talks about how to extend Perl by writing code in other languages, using the In lin e:: * modules. Chapter 10, Fun with Perl, closes on a lighter note with a few recreational (and educational) uses of Perl. 7 / 216 Conventions Used in This Book The following typographical conventions are used in this book: Plain text Indicates menu titles, menu options, menu buttons, and keyboard accelerators (such as Alt and C trl). Italic Indicates new terms, URLs, email addresses, filenames, file extensions, pathnames, directories, and Unix utilities. Constant width Indicates commands, options, switches, variables, attributes, keys, functions, classes, namespaces, methods, modules, parameters, values, XML tags, HTML tags, the contents of files, or the output from commands. Constant width bold Shows commands or other text that should be typed literally by the user. C o ns t an t w id t h i ta l i c Shows text that should be replaced with user-supplied values. This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. 8 / 216 Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O'Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product's documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: "Advanced Perl Programming, Second Edition by Simon C ozens. Copyright 2005 O'Reilly Media, Inc. 0-596-00456-7." If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. 9 / 216 We'd Like to Hear from You Please address comments and questions concerning this book to the publisher: O'Reilly Media 1005 Gravenstein Highway North Sebastopol, CA 95472 (800) 998-9938 (in the United States or Canada) (707) 829-0515 (international or local) (707) 829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: http://www.oreilly.com/catalog/advperl2/ To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about our books, conferences, Resource Centers, and the O 'Reilly Network, see our web site at: http://www.oreilly.com 10 / 216 [...]... (print) Working with Perl at the op level requires a great deal of practice and knowledge of the Perl internals, but can lead to extremely useful tools like Devel::Cover , an oplevel profiler and coverage analysis tool 23 / 216 1.2 Messing with the Class Model Perl' s style of object orientation is often maligned, but its sheer simplicity allows the advanced Perl programmer to extend Perl' s behavior in... would set up a recursive relationship Perl doesn't like that In only 11 lines of code we've extended the way Perl' s O O system works with a new concept borrowed from another language Perl' s model may not be terribly advanced, but it's astonishingly flexible 26 / 216 1.3 Unexpected Code The final set of advanced techniques in this chapter covers anything where Perl code runs at a time that might not... the ability to inspect some of the properties of the Perl bytecode tree to determine properties of the program The second idea we'll look at is the class model Writing object-oriented programs and modules is sometimes regarded as advanced Perl, but I would categorize it as intermediate A s this is an advanced book, we're going to learn how to subvert Perl' s object-oriented model to suit our goals Finally,... Techniques O nce you have read the Camel Book (Programming Perl) , or any other good Perl tutorial, you know almost all of the language There are no secret keywords, no other magic sigils that turn on Perl' s advanced mode and reveal hidden features In one sense, this book is not going to tell you anything new about the Perl language What can I tell you, then? I used to be a student of music Music is very... running code in place of operators in the case of overloading, some advanced uses of tying, and controlling when code runs using named blocks and eval These three areas, together with the special case of Perl XS programmingwhich we'll look at in Chapter 9 on Inline delineate the fundamental techniques from which all advanced uses of Perl are made up 13 / 216 1.1 Introspection First, though, introspection... can get and set the values directly In Perl terms, the symbol table maps to a reference to $a Figure 1-1 Consulting the symbol table, take 1 You may have noticed that a symbol table is something that maps names to storage, which sounds a lot like a Perl hash In fact, you'd be ahead of the game, since the Perl symbol table is indeed implemented using an ordinary Perl hash You may also have noticed, however,... symbolic references should only be used by people who know what they're doing We use no strict 'refs'; to tell Perl that we're planning on doing good magic with symbolic references Many advanced uses of Perl need to do some of the things that strict prevents the uninitiated from doing A s an initiated Perl user, you will occasionally have to turn strictures off This isn't something to take lightly, but don't... you need to know a little about the Perl virtual machine Like almost all V M technologies, Perl 5 is a software CPU that executes a stream of instructions Many of these operations will involve putting values on or taking them off a stack; unlike a real CPU, which uses registers to store intermediate results, most software CPUs use a stack model Perl code enters the perl interpreter, gets translated into... together I've said that there are no secret switches to turn on advanced features in Perl, and this means that everyone starts on a level playing field, in just the same way that Johann Sebastian Bach and a little kid playing with a xylophone have precisely the same raw materials to work with The key to producing advanced Perlor advanced musicdepends on two things: knowledge of techniques and experience of... with it, I'd say that adding things to UNIVERSAL is a useful technique for the armory of any advanced Perl hacker 1.2.2 Dynamic Method Resolution If you're still convinced that Perl' s O O system is not the sort of thing that you want, then the time has come to write your own Damian Conway's O bject O riented Perl is full of ways to construct new forms of objects and object dispatch We've seen the fundamental . programmer. O'Reilly's most high-level Perl tutorial to date, Advanced Perl Programming, Second Edition teaches you all the complex techniques for production-ready Perl programs. This completely updated. Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages:. even novices can apply immediately." chromatic, Editor of P erl.com 1 / 216 Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: