Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents | Index With a worldwide community of users and more than a million dedicated programmers, Perl has proven to be the most effective language for the latest trends in computing and business Every programmer must keep up with the latest tools and techniques This updated version of Advanced Perl Programming from O'Reilly gives you the essential knowledge of the modern Perl programmer Whatever your current level of Perl expertise, this book will help you push your skills to the next level and become a more accomplished programmer O'Reilly's most high-level Perl tutorial to date, Advanced Perl Programming, Second Edition teaches you all the complex techniques for production-ready Perl programs This completely updated guide clearly explains concepts such as introspection, overriding builtins, extending Perl's object-oriented model, and testing your code for greater stability Other topics include: Complex data structures Parsing Templating toolkits Working with natural language data Unicode Interaction with C and other languages In addition, this guide demystifies once complex topics like object-relational mapping and event-based development-arming you with everything you need to completely upgrade your skills Praise for the Second Edition: "Sometimes the biggest hurdle to problem solving isn't the subject itself but rather the sheer number of modules Perl provides Advanced Perl Programming walks you through Perl's TMTOWTDI ("There's More Than One Way To Do It") forest, explaining and comparing the best modules for each task so you can intelligently apply them in a variety of situations." Rocco Caputo, lead developer of POE "It has been said that sufficiently advanced Perl code is indistinguishable from magic This book of spells goes a long way to unlocking those secrets It has the power to transform the most humble programmer into a Perl wizard." Andy Wardley "The information here isn't theoretical It presents tools and techniques for solving real problems cleanly and elegantly." Curtis 'Ovid' Poe " Advanced Perl Programming collects hard-earned knowledge from some of the best programmers in the Perl community, and explains it in a way that even novices can apply immediately." chromatic, Editor of Perl.com Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents | Index Copyright Preface Audience Contents Conventions Used in This Book Using Code Examples We'd Like to Hear from You Safari® Enabled Acknowledgments Chapter 1 Advanced Techniques Section 1.1 Introspection Section 1.2 Messing with the Class Model Section 1.3 Unexpected Code Section 1.4 Conclusion Chapter 2 Parsing Techniques Section 2.1 Parse::RecDescent Grammars Section 2.2 Parse::Yapp Section 2.3 Other Parsing Techniques Section 2.4 Conclusion Chapter 3 Templating Tools Section 3.1 Formats and Text::Autoformat Section 3.2 Text::Template Section 3.3 HTML::Template Section 3.4 HTML::Mason Section 3.5 Template Toolkit Section 3.6 AxKit Section 3.7 Conclusion Chapter 4 Objects, Databases, and Applications Section 4.1 Beyond Flat Files Section 4.2 Object Serialization Section 4.3 Object Databases Section 4.4 Database Abstraction Section 4.5 Practical Uses in Web Applications Section 4.6 Conclusion Chapter 5 Natural Language Tools Section 5.1 Perl and Natural Languages Section 5.2 Handling English Text Section 5.3 Modules for Parsing English Section 5.4 Categorization and Extraction Section 5.5 Conclusion Chapter 6 Perl and Unicode Section 6.1 Terminology Section 6.2 What Is Unicode? Section 6.3 Unicode Transformation Formats Section 6.4 Handling UTF-8 Data Section 6.5 Encode Section 6.6 Unicode for XS Authors Section 6.7 Conclusion Chapter 7 POE Section 7.1 Programming in an Event-Driven Environment Section 7.2 Top-Level Pieces: Components Section 7.3 Conclusion Chapter 8 Testing Section 8.1 Test::Simple Section 8.2 Test::More Section 8.3 Test::Harness Section 8.4 Test::Builder Section 8.5 Test::Builder::Tester Section 8.6 Keeping Tests and Code Together Section 8.7 Unit Tests Section 8.8 Conclusion Chapter 9 Inline Extensions Section 9.1 Simple Inline::C Section 9.2 More Complex Tasks with Inline::C Section 9.3 Inline:: Everything Else Section 9.4 Conclusion Chapter 10 Fun with Perl Section 10.1 Obfuscation Section 10.2 Just Another Perl Hacker Section 10.3 Perl Golf Section 10.4 Perl Poetry Section 10.5 Acme::* Section 10.6 Conclusion Colophon About the Author Colophon Index Advanced Perl Programming, Second Edition by Simon Cozens Copyright © 2005, 1997 O'Reilly Media,Inc All rights reserved Printed in the United States of America Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O'Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (safari.oreilly.com) For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com Editor: Allison Randal Production Editor: Darren Kelly Cover Designer: Edie Freedman Interior Designer: David Futato Production Services: nSight,Inc Printing History: August 1997: First Edition June 2005: Second Edition Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc Advanced Perl Programming, the image of a of a black leopard, and related trade dress are trademarks of O'Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O'Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 0-596-00456-7 [M] Preface It was all Nathan Torkington's fault Our Antipodean programmer, editor, and O'Reilly conference supremo friend asked me to update the original Advanced Perl Programming way back in 2002 The Perl world had changed drastically in the five years since the publication of the first edition, and it continues to change Particularly, we've seen a shift away from techniques and toward resourcesfrom doing things yourself with Perl to using what other people have done with Perl In essence, advanced Perl programming has become more a matter of knowing where to find what you need on the CPAN,[*] rather than a matter of knowing what to do [*] The Comprehensive Perl Archive Network (http://www.cpan.org) is the primary resource for user-contributed Perl code Perl changed in other ways, too: the announcement of Perl 6 in 2000 ironically caused a renewed interest in Perl 5, with people stretching Perl in new and interesting directions to implement some of the ideas and blue-skies thinking about Perl 6 Contrary to what we all thought back then, far from killing off Perl 5, Perl 6's development has made it stronger and ensured it will be around longer So it was in this context that it made sense to update Advanced Perl Programming to reflect the changes in Perl and in the CPAN We also wanted the new edition to be more in the spirit of Perlto focus on how to achieve practical tasks with a minimum of fuss This is why we put together chapters on parsing techniques, on dealing with natural language documents, on testing your code, and so on But this book is just a beginning; however tempting it was to try to get down everything I ever wanted to say about Perl, it just wasn't possible First, because Perl usage covers such a wide spreadon the CPAN, there are ready-made modules for folding DNA sequences, paying bills online, checking the weather, and playing poker And more are being added every day, faster than any author can keep up Second, as we've mentioned, because Perl is changing I don't know what the next big advance in Perl will be; I can only take you through some of the more important techniques and resources available at the moment Hopefully, though, at the end of this book you'll have a good idea of how to use what's available, how you can save yourself time and effort by using Perl and the Perl resources available to get your job done, and how you can be ready to use and integrate whatever developments come down the line In the words of Larry Wall, may you do good magic with Perl! Audience If you've read Learning Perl and Programming Perl and wonder where to go from there, this book is for you It'll help you climb to the next level of Perl wisdom If you've been programming in Perl for years, you'll still find numerous practical tools and techniques to help you solve your everyday problems YAML sessions, POE SGF (Smart Game Format) shutdown methods, Test::Class singleton method, applying to object rather than class skips, testing and Sleepycat Berkeley DB Smart Game Format (SGF) source filters spam, NLP for preventing SpamAssassin special variables $! (special variable) formats and splitters, Lingua::EN::Splitter 2nd SPOPS Maypole and relational mappers sprintf, getting formatted output SQLite square brackets ([ ]) stack, handling in Inline::C startup methods, Test::Class stemming words, natural language tools stopwords Lingua::EN::StopWords 2nd natural language tools Storable module storage, symbol table mapping to storing/retrieving data stringification operator (" ") strings encoding overloading and traversing Struct, Inline::Struct structured data parsing wrapping subclassing, singleton method and subroutines creating with globs wrappers around subrules, matching portions of data stream substr, getting formatted output summarization, document categorization Summarization, Lingua::EN::Summarize symbol table finding globs in looking up variable names in Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] tagging documents, Lingua::EN::Tagger taint mode, Text::Template Tangram classes and schema create, read, update, delete overview of Tangram::Cursor Tangram::Expr Tangram::Filter Tangram::Storage TANGRAM_TRACE TAP (Test Anything Protocol) TCP POE::Component::Client::TCP POE::Component::Server::TCP Template Toolkit Class::DBI and components and macros filters overview of pluggins RSS Aggregator Template::Plugin::Autoformat Template::Plugin::XML::RSS Template::Plugin::XML::Simple templating tools AxKit formats and Text::Autoformat terminal symbol, parsing for Test Anything Protocol (TAP) Test::Builder Test::Builder::Tester Test::Class overview of startup and shutdown methods testing Apache, DBI, and other complex environments Test::Fuzzy Test::Harness Test::Inline Test::MockObject Test::More automated testing with overview of skips and redos Test::Simple Test::Tutorial testing Apache automated DBI keeping tests and code together overview of Pod::Tests skips and redos Test::Builder Test::Builder::Tester Test::Class Test::Harness Test::MockObject Test::More Test::Simple unit tests TeX package text handlers text, English converting words to numbers inflections pluralization splitting into chunks stemming words stopwords text-processing languages 2nd Text::Autoformat compared with Perl formatting language hyphenation and Template::Plugin::Autoformat wrapping structured text Text::Balanced Text::Sentence Text::Template compared with Template Toolkit loops, arrays, and hashes and overview of security and error checking tricks Text::Wrap TextTiling, Lingua::Segmenter::TextTiling threading, attributes and Time module, overloading time shifting BEGIN CHECK blocks and DESTROY eval overview of Tk tokens bottom-up parsing and HTML::TokeParser top-down parsers tracing, TANGRAM_TRACE tutorials POE Test::Tutorial Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] UCS (Universal Character Set) overview of UCS-2 UCS-4 unexpected code limitations of overloading non-operator overloading operator overloading overloading overview of Unicode entering Unicode characters handling UTF-8 data overview of PerlIO and reasons for using regular expressions terminology and UCS (Universal Character Set) Unicode Consortium XS authoring and Unicode Technical Committee (UTC) Unicode Technical Report (UTR) Unicode: A Primer (Graham) unit tests overview of Test::Class Test::MockObject UNIVERSAL class can method combine with AUTOLOAD isa method methods require module VERSION method updates Class::DBI Tangram use strict UTC (Unicode Technical Committee) UTF (Unicode Transformation Format) character encoding transformation formats UCS-2 UCS-4 UTF-16BE UTF-16LE UTF-32 UTF-7 UTF-8 UTF-EBCDIC UTF-8 encoding strings handling UTF-8 data from external sources handling UTF-8 data from inside program overview of traversing strings UTR (Unicode Technical Report) Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] validation, HTML::Template variables accessing HTML::Template VERSION method, UNIVERSAL class virtual machines, Perl Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Wall, Larry Wardley, Andy Watkiss, Neil 2nd Web applications Class::DBI and Template Toolkit and Maypole overview of Web sites Paris Perl Mongers POE tutorial Test::Tutorial wheels, POE POE::Wheel::Curses POE::Wheel::FollowTail POE::Wheel::ReadWrite POE::Wheel::SocketFactory whitespace Parse::RecDescent Perl tolerance of Winters, Chris word distribution, Zipf's Law words, natural language tools for converting to numbers wrappers around subroutines C libraries HOOK::LexWrap Text::Autoformat Text::Wrap Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] XML RSS based on Template::Plugin::XML::RSS Template::Plugin::XML::Simple transforming to HTML XML::Parser XP (Extreme Programming) 2nd XS (extension subroutines) bridging Perl and C encoding strings Perl internal values and references skeleton module traversing strings XS (extention subroutines) traversing strings XSP (Extensible Server Pages) Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] yacc (Yet Another Compiler Compiler) as traditional parser overview of YAML (YAML Ain't Markup Language) yapp command-line utility Parse::Yapp Yona, Shlomo Index [SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J] [K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V] [W] [X] [Y] [Z] Zipf's Law of Word Distribution ... programmers in the Perl community, and explains it in a way that even novices can apply immediately." chromatic, Editor of Perl. com Advanced Perl Programming, 2nd Edition By Simon Cozens Publisher: O'Reilly Pub Date: June 2005 ISBN: 0-596-00456-7 Pages: 304 Table of Contents... August 1997: First Edition June 2005: Second Edition Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc Advanced Perl Programming, the image of a of a black leopard,... In the words of Larry Wall, may you do good magic with Perl! Audience If you've read Learning Perl and Programming Perl and wonder where to go from there, this book is for you It'll help you climb to the next level of Perl wisdom If you've been programming in