THÔNG TIN TÀI LIỆU
Page iii
Mastering Algorithms with Perl
Jon Orwant, Jarkko Hietaniemi,
and John Macdonald
Page iv
Mastering Algorithms with Perl
by Jon Orwant, Jarkko Hietaniemi. and John Macdonald
Copyright © 1999 O'Reilly & Associates, Inc. All rights reserved.
Printed in the United States of America.
Cover illustration by Lorrie LeJeune, Copyright © 1999 O'Reilly & Associates, Inc.
Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
Editors: Andy Oram and Jon Orwant
Production Editor: Melanie Wang
Printing History:
August 1999: First Edition.
Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and
sellers to distinguish their products are claimed as trademarks. Where those designations
appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the
designations have been printed in caps or initial caps. The association between the image of a
wolf and the topic of Perl algorithms is a trademark of O'Reilly & Associates, Inc.
While every precaution has been taken in the preparation of this book, the publisher assumes no
responsibility for errors or omissions, or for damages resulting from the use of the information
contained herein.
ISBN: 1-56592-398-7 [1/00]
[M]]break
Page v
Table of Contents
Preface xi
1. Introduction 1
What Is an Algorithm?
1
Efficiency
8
Recurrent Themes in Algorithms
20
2. Basic Data Structures 24
Perl's Built-in Data Structures
25
Build Your Own Data Structure
26
A Simple Example
27
Perl Arrays: Many Data Structures in One
37
3. Advanced Data Structures 46
Linked Lists
47
Circular Linked Lists
60
Garbage Collection in Perl
62
Doubly-Linked Lists
65
Doubly-Linked Lists
65
Infinite Lists
71
The Cost of Traversal
72
Binary Trees
73
Heaps
91
Binary Heaps
92
Janus Heap
99
Page vi
The Heaps Module
99
Future CPAN Modules
101
4. Sorting 102
An Introduction to Sorting
102
All Sorts of Sorts
119
Sorting Algorithms Summary
151
5. Searching 157
Hash Search and Other Non-Searches
158
Lookup Searches
159
Generative Searches
175
6. Sets 203
Venn Diagrams
204
Creating Sets
205
Set Union and Intersection
209
Set Differences
217
Set Differences
217
Counting Set Elements
222
Set Relations
223
The Set Modules of CPAN
227
Sets of Sets
233
Multivalued Sets
240
Sets Summary
242
7. Matrices 244
Creating Matrices
246
Manipulating Individual Elements
246
Finding the Dimensions of a Matrix
247
Displaying Matrices
247
Adding or Multiplying Constants
248
Transposing a Matrix
254
Multiplying Matrices
256
Extracting a Submatrix
259
Combining Matrices
260
Inverting a Matrix
261
Computing the Determinant
262
Gaussian Elimination
263
Eigenvalues and Eigenvectors
266
Page vii
The Matrix Chain Product
269
The Matrix Chain Product
269
Delving Deeper
272
8. Graphs 273
Vertices and Edges
276
Derived Graphs
281
Graph Attributes
286
Graph Representation in Computers
287
Graph Traversal
301
Paths and Bridges
310
Graph Biology: Trees, Forests, DAGS, Ancestors, and Descendants
312
Edge and Graph Classes
316
CPAN Graph Modules
351
9. Strings 353
Perl Builtins
354
String-Matching Algorithms
357
Phonetic Algorithms
388
Stemming and Inflection
389
Parsing
394
Compression
411
10. Geometric Algorithms 425
Distance
426
Area, Perimeter, and Volume
429
Direction
433
Intersection
435
Intersection
435
Inclusion
443
Boundaries
449
Closest Pair of Points
457
Geometric Algorithms Summary
464
CPAN Graphics Modules
464
11. Number Systems 469
Integers and Reals
469
Strange Systems
480
Trigonometry
491
Significant Series
492
Page viii
12. Number Theory 499
Basic Number Theory
499
Prime Numbers
504
Unsolved Problems
522
13. Cryptography 526
Legal Issues
527
Authorizing People with Passwords
528
Authorization of Data: Checksums and More
533
Obscuring Data: Encryption
538
Hiding Data: Steganography
555
Winnowing and Chaffing
558
Winnowing and Chaffing
558
Encrypted Perl Code
562
Other Issues
564
14. Probability 566
Random Numbers
567
Events
569
Permutations and Combinations
571
Probability Distributions
574
Rolling Dice: Uniform Distributions
576
Loaded Dice and Candy Colors: Nonuniform Discrete Distributions
582
If the Blue Jays Score Six Runs: Conditional Probability
589
Flipping Coins over and Over: Infinite Discrete Distributions
590
How Much Snow? Continuous Distributions
591
Many More Distributions
592
15. Statistics 599
Statistical Measures
600
Significance Tests
608
Correlation
620
16. Numerical Analysis 626
Computing Derivatives and Integrals
627
Solving Equations
634
Interpolation, Extrapolation, and Curve Fitting
642
Page ix
A. Further Reading 649
B. ASCII Character Set 652
Index 657
Page xi
Preface
Perl's popularity has soared in recent years. It owes its appeal first to its technical superiority:
Perl's unparalleled portability, speed, and expressiveness have made it the language of choice
for a million programmers worldwide.
Those programmers have extended Perl in ways unimaginable with languages controlled by
committees or companies. Of all languages, Perl has the largest base of free utilities, thanks to
the Comprehensive Perl Archive Network (abbreviated CPAN; see
http://www.perl.com/CPAN/). The modules and scripts you'll find there have made Perl the
most popular language for web; text, and database programming.
But Perl can do more than that. You can solve complex problems in Perl more quickly, and in
fewer lines, than in any other language.
This ease of use makes Perl an excellent tool for exploring algorithms. Computer science
embraces complexity; the essence of programming is the clean dissection of a seemingly
insurmountable problem into a series of simple, computable steps. Perl is ideal for tackling the
tougher nuggets of computer science because its liberal syntax lets the programmer express his
or her solution in the manner best suited to the task. (After all, Perl's motto is There's More
Than One Way To Do It.) Algorithms are complex enough; we don't need a computer language
making it any tougher.
Most books about computer algorithms don't include working programs. They express their
ideas in quasi-English pseudocode instead, which allows the discussion to focus on concepts
without getting bogged down in implementation details. But sometimes the details are what
matter—the inefficiencies of a bad implementation sometimes cancel the speedup that a good
algorithm provides. The devil is in the details.break
Page xii
And while converting ideas to programs is often a good exercise, it's also just plain
time-consuming. So, in this book we've supplied you with not just explanations, but
implementations as well. If you read this book carefully, you'll learn more about both
algorithms and Perl.
About This Book
This book is written for two kinds of people: those who want cut and paste solutions and those
who want to hone their programming skills. You'll see how we solve some of the classic
problems of computer science and why we solved them the way we did.
Theory or Practice?
Like the wolf featured on the cover, this book is sometimes fierce and sometimes playful. The
fierce part is the computer science: we'll often talk like computer scientists talk and discuss
problems that matter little to the practical Perl programmer. Other times, we'll playfully
explain the problem and simply tell you about ready-made solutions you can find on the Internet
(almost always on CPAN).
Deciding when to be fierce and when to be playful hasn't been easy for us. For instance, every
algorithms textbook has a chapter on all of the different ways to sort a collection of items. So
do we, even though Perl provides its own sort() function that might be all you ever need.
We do this for four reasons. First, we don't want you thinking you've Mastered Algorithms
without understanding the algorithms covered in every college course on the subject. Second,
the concepts, processes, and strategies underlying those algorithms will come in handy for
more than just sorting. Third, it helps to know how Perl's sort() works under the hood, why
its particular algorithm (quicksort) was used, and how to avoid some of the inefficiencies that
even experienced Perl programmers fall prey to. Finally, sort() isn't always the best
solution! Someday, you might need another of the techniques we provide.
When it comes to the inevitable tradeoffs between theory and practice, programmers' tastes
vary. We have chosen a middle course, swiftly pouncing from one to the other with feral
abandon. If your tastes are exclusively theoretical or practical, we hope you'll still appreciate
the balanced diet you'll find here.
Organization of This Book
The chapters in this book can be read in isolation; they typically don't require knowledge from
previous chapters. However, we do recommend that you read at least Chapter 1, Introduction,
and Chapter 2, Basic Data Structures, which provide the basic material necessary for
understanding the rest of the book.break
Page xiii
Chapter 1 describes the basics of Perl and algorithms, with an emphasis on speed and general
problem-solving techniques.
Chapter 2 explains how to use Perl to create simple and very general representations, like
queues and lists of lists.
Chapter 3, Advanced Data Structures, shows how to build the classic computer science data
structures.
Chapter 4, Sorting, looks at techniques for ordering data and compares the advantages of each
technique.
Chapter 5, Searching, investigates ways to extract individual pieces of information from a
larger collection.
Chapter 6, Sets, discusses the basics of set theory and Perl implementations of set operations.
Chapter 7, Matrices, examines techniques for manipulating large arrays of data and solving
problems in linear algebra.
Chapter 8, Graphs, describes tools for solving problems that are best represented as a graph:
a collection of nodes connected by edges.
Chapter 9, Strings, explains how to implement algorithms for searching, filtering, and parsing
strings of text.
Chapter 10, Geometric Algorithms, looks at techniques for computing with two-and
three-dimensional constructs.
Chapter 11, Number Systems, investigates methods for generating important constants,
functions, and number series, as well as manipulating numbers in alternate coordinate systems.
Chapter 12, Number Theory, examines algorithms for factoring numbers, modular arithmetic,
and other techniques for computing with integers.
Chapter 13, Cryptography, demonstrates Perl utilities to conceal your data from prying eyes.
Chapter 14, Probability, discusses how to use Perl for problems involving chance.
Chapter 15, Statistics, describes methods for analyzing the accuracy of hypotheses and
characterizing the distribution of data.
Chapter 16, Numerical Analysis, looks at a few of the more common problems in scientific
computing.
Appendix A, Further Reading, contains an annotated bibliography.break
Page xiv
Appendix B, ASCII Character Set, lists the seven-bit ASCII character set used by default when
Perl sorts strings.
Conventions Used in This Book
Italic
Used for filenames, directory names, URLs, and occasional emphasis.
Constant width
Used for elements of programming languages, text manipulated by programs, code
examples, and output.
Constant width bold
Used for user input and for emphasis in code.
Constant width italic
Used for replaceable values.
[...]... encouragement from many other family members, friends, and co-workers (these groups overlap).break Page xvii Comments and Questions Please address comments and questions concerning this book to the publisher: O'Reilly & Associates, Inc 101 Morris Street Sebastopol, CA 95472 80 0-9 9 8-9 938 (in the U.S or Canada) 70 7-8 2 9-0 515 (international/local) 70 7-8 2 9-0 104 (FAX) You can also send us messages electronically... chapter, we'll discuss how to "think algorithms" —how to design and analyze programs that solve problems We'll start with a gentle introduction to algorithms and a not-so-gentle introduction to Perl, then consider some of the tradeoffs involved in choosing the right implementation for your needs, and finally introduce some themes pervading the field: recursion, divide-and-conquer, and dynamic programming... this to the perlbug mailing list: Hi, I'd appreciate if this is a known bug and if a patch is available int of (2.4/0.2) returns 11 instead of the expected 12 It would seem that this poor fellow is correct: perl -e 'print int(2.4/0.2)' indeed prints 11 You might expect it to print 12, because two-point-four divided by oh-point-two is twelve, and the integer part of 12 is 12 Must be a bug in Perl, right?... References The most significant addition to the Perl language in Perl 5 is references, their use is described in the perlref documentation bundled with Perl A reference is a scalar value (thus, all references begin with a $) whose value is the location (more or less) of another variable That variable might be another scalar, or an array, a hash, or even a snippet of Perl code The advantage of references is... lookup table with for eachcontinue Page 10 color, with 256 values each You still have to add the results together, so it takes a little more time than the bigger lookup table The relative costs of coding for time, coding for space, and this middle-of-the-road approach are shown in Table 1-2 n is the number of computations to be performed; cost(x) is the amount of time needed to perform x Table 1-2 Three... $array->[$try] lt $word; # Raise bottom $high = $try-1, next if $array->[$try] gt $word; # Lower top return $try; } return; # We've found the word! # The word isn't there } Depending on how much Perl you know, this might seem crystal clear or hopelessly opaque As the preface said, if you don't know Perl, you probably don't want to learn it with this book Nevertheless, here's a brief description of the Perl. .. part of 12 is 12 Must be a bug in Perl, right? Wrong Floating-point numbers are not real numbers When you divide 2.4 by 0.2, what you're really doing is dividing Perl' s binary floating-point representation of 2.4 by Perl' s binary floating-point representation of 0.2 In all computer languages that use IEEE floating-point representations (not just Perl! ) the result will be a smidgen less than 12, which is... They'll all be used in algorithms in later chapters Many Perl programs never need any data structures other than those provided by the language itself, shown in Table 2-1 Table 2-1 Basic Perl Datatypes Type and Designating Symbol Meaning $scalar number integer or float string arbitrary length sequence of characters reference "pointer" to another Perl data structure object a Perl data structure that... #!/usr/bin /perl use Benchmark; sub quadratic { # Compute the larger root of a quadratic polynomial my ($a, $b, $c) = @_; return (-$ b + sqrt($b*$b - 4*$a * $c)) / 2*$a; } sub bruteforce { # Search linearly until we find a good-enough choice my ($low, $high) = @_; my $x; for ($x = $low; $x 'quadratic(1, 1, -1 )',... one That said, if you don't know Perl, you don't want to start here We recommend you begin with either of these books published by O'Reilly & Associates: Randal L Schwartz and Tom Christiansen's Learning Perl if you're new to programming, and Larry Wall, Tom Christiansen, and Randal L Schwartz's Programming Perl if you're not If you want more rigorous explanations of the algorithms discussed in this book, . iii Mastering Algorithms with Perl Jon Orwant, Jarkko Hietaniemi, and John Macdonald Page iv Mastering Algorithms with Perl by Jon Orwant, Jarkko Hietaniemi. and John Macdonald Copyright © 1999. Associates, Inc. 101 Morris Street Sebastopol, CA 95472 80 0-9 9 8-9 938 (in the U.S. or Canada) 70 7-8 2 9-0 515 (international/local) 70 7-8 2 9-0 104 (FAX) You can also send us messages electronically "think algorithms& quot;—how to design and analyze programs that solve problems. We'll start with a gentle introduction to algorithms and a not-so-gentle introduction to Perl, then consider
Ngày đăng: 25/03/2014, 10:27
Xem thêm: mastering algorithms with perl - o'reilly 1999