Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 1.249 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
1.249
Dung lượng
11,47 MB
Nội dung
Algorithms in Java: Parts 1-4, Third Edition By Robert Sedgewick Publisher: Addison Wesley Pub Date: July 23, 2002 ISBN: 0-201-36120-5, 768 pages Sedgewick has a real gift for explaining concepts in a way that makes them easy to understand The use of real programs in page-size (or less) chunks that can be easily understood is a real plus The figures, programs, and tables are a significant contribution to the learning experience of the reader; they make this book distinctive.-William A Ward, University of South Alabama This edition of Robert Sedgewick's popular work provides current and comprehensive coverage of important algorithms for Java programmers Michael Schidlowsky and Sedgewick have developed new Java implementations that both express the methods in a concise and direct manner and provide programmers with the practical means to test them on real applications Many new algorithms are presented, and the explanations of each algorithm are much more detailed than in previous editions A new text design and detailed, innovative figures, with accompanying commentary, greatly enhance the presentation The third edition retains the successful blend of theory and practice that has made Sedgewick's work an invaluable resource for more than 400,000 programmers! This particular book, Parts 1-4, represents the essential first half of Sedgewick's complete work It provides extensive coverage of fundamental data structures and algorithms for sorting, searching, and related applications Although the substance of the book applies to programming in any language, the implementations by Schidlowsky and Sedgewick also exploit the natural match between Java classes and abstract data type (ADT) implementations Highlights Java class implementations of more than 100 important practical algorithms Emphasis on ADTs, modular programming, and objectoriented programming Extensive coverage of arrays, linked lists, trees, and other fundamental data structures Thorough treatment of algorithms for sorting, selection, priority queue ADT implementations, and symbol table ADT implementations (search algorithms) Complete implementations for binomial queues, multiway radix sorting, randomized BSTs, splay trees, skip lists, multiway tries, B trees, extendible hashing, and many other advanced methods Quantitative information about the algorithms that gives you a basis for comparing them More than 1,000 exercises and more than 250 detailed figures to help you learn properties of the algorithms Whether you are learning the algorithms for the first time or wish to have up-to-date reference material that incorporates new programming styles with classic and new algorithms, you will find a wealth of useful information in this book Algorithms in Java: Parts 1-4, Third Edition Copyright Preface Scope Use in the Curriculum Algorithms of Practical Use Programming Language Acknowledgments Java Consultant's Preface Notes on Exercises Part I: Fundamentals Chapter 1 Introduction Section 1.1 Algorithms Section 1.2 A Sample Problem: Connectivity Section 1.3 Union–Find Algorithms Section 1.4 Perspective Section 1.5 Summary of Topics Chapter 2 Principles of Algorithm Analysis Section 2.1 Implementation and Empirical Analysis Section 2.2 Analysis of Algorithms Section 2.3 Growth of Functions Section 2.4 Big-Oh Notation Section 2.5 Basic Recurrences Section 2.6 Examples of Algorithm Analysis Section 2.7 Guarantees, Predictions, and Limitations References for Part One Part II: Data Structures Chapter 3 Elementary Data Structures Section 3.1 Building Blocks Section 3.2 Arrays Section 3.3 Linked Lists Section 3.4 Elementary List Processing Section 3.5 Memory Allocation for Lists Section 3.6 Strings Section 3.7 Compound Data Structures Chapter 4 Abstract Data Types Exercises Section 4.1 Collections of Items Section 4.2 Pushdown Stack ADT Section 4.3 Examples of Stack ADT Clients Section 4.4 Stack ADT Implementations Section 4.5 Generic Implementations Section 4.6 Creation of a New ADT Section 4.7 FIFO Queues and Generalized Queues Section 4.8 Duplicate and Index Items Section 4.9 First-Class ADTs Section 4.10 Application-Based ADT Example Section 4.11 Perspective Chapter 5 Recursion and Trees Section 5.1 Recursive Algorithms Section 5.2 Divide and Conquer Section 5.3 Dynamic Programming Section 5.4 Trees Section 5.5 Mathematical Properties of Binary Trees Section 5.6 Tree Traversal Section 5.7 Recursive Binary-Tree Algorithms Section 5.8 Graph Traversal Section 5.9 Perspective References for Part Two Part III: Sorting Chapter 6 Elementary Sorting Methods Section 6.1 Rules of the Game Section 6.2 Generic Sort Implementations Section 6.3 Selection Sort Section 6.4 Insertion Sort Section 6.5 Bubble Sort Section 6.6 Performance Characteristics of Elementary Sorts Section 6.7 Algorithm Visualization Section 6.8 Shellsort Section 6.9 Sorting of Linked Lists Section 6.10 Key-Indexed Counting Chapter 7 Quicksort Section 7.1 The Basic Algorithm Section 7.2 Performance Characteristics of Quicksort Section 7.3 Stack Size Section 7.4 Small Subfiles Section 7.5 Median-of-Three Partitioning Section 7.6 Duplicate Keys Section 7.7 Strings and Vectors Section 7.8 Selection Chapter 8 Merging and Mergesort Section 8.1 Two-Way Merging Section 8.2 Abstract In-Place Merge Section 8.3 Top-Down Mergesort Section 8.4 Improvements to the Basic Algorithm Section 8.5 Bottom-Up Mergesort Section 8.6 Performance Characteristics of Mergesort Section 8.7 Linked-List Implementations of Mergesort Section 8.8 Recursion Revisited Chapter 9 Priority Queues and Heapsort Exercises Section 9.1 Elementary Implementations Section 9.2 Heap Data Structure Section 9.3 Algorithms on Heaps Section 9.4 Heapsort Section 9.5 Priority-Queue ADT Section 9.6 Priority Queues for Client Arrays Section 9.7 Binomial Queues Chapter 10 Radix Sorting Section 10.1 Bits, Bytes, and Words Section 10.2 Binary Quicksort Section 10.3 MSD Radix Sort Section 10.4 Three-Way Radix Quicksort Section 10.5 LSD Radix Sort Section 10.6 Performance Characteristics of Radix Sorts Section 10.7 Sublinear-Time Sorts Chapter 11 Special-Purpose Sorting Methods Section 11.1 Batcher's Odd–Even Mergesort Section 11.2 Sorting Networks Section 11.3 Sorting In Place Section 11.4 External Sorting Section 11.5 Sort–Merge Implementations Section 11.6 Parallel Sort–Merge References for Part Three Part IV: Searching Chapter 12 Symbol Tables and Binary Search Trees Section 12.1 Symbol-Table Abstract Data Type Section 12.2 Key-Indexed Search Section 12.3 Sequential Search Section 12.4 Binary Search Section 12.5 Index Implementations with Symbol Tables Section 12.6 Binary Search Trees Section 12.7 Performance Characteristics of BSTs Section 12.8 Insertion at the Root in BSTs Section 12.9 BST Implementations of Other ADT Operations Chapter 13 Balanced Trees Exercises Section 13.1 Randomized BSTs Section 13.2 Splay BSTs Section 13.3 Top-Down 2-3-4 Trees Section 13.4 Red–Black Trees Section 13.5 Skip Lists Section 13.6 Performance Characteristics Chapter 14 Hashing Section 14.1 Hash Functions Section 14.2 Separate Chaining Section 14.3 Linear Probing Section 14.4 Double Hashing Section 14.5 Dynamic Hash Tables Section 14.6 Perspective Chapter 15 Radix Search Section 15.1 Digital Search Trees Section 15.2 Tries Section 15.3 Patricia Tries Section 15.4 Multiway Tries and TSTs Section 15.5 Text-String–Index Algorithms Chapter 16 External Searching Section 16.1 Rules of the Game Section 16.2 Indexed Sequential Access Section 16.3 B Trees Section 16.4 Extendible Hashing Section 16.5 Perspective References for Part Four Appendix Exercises Top list of pages Empirically determine the average number of probes required for a search after building a table from N items with random keys, for M = 10, 100, and 1000 and N = 103, 104, 105, and 106 16.45 Modify double hashing (Program 14.6) to use pages of size 2M, treating accesses to full pages as "collisions." Empirically determine the average number of probes required for a search after building a table from N items with random keys, for M = 10, 100, and 1000 and N = 103, 104, 105, and 106, using an initial table size of 3N/2M Top 16.5 Perspective The most important application of the methods discussed in this chapter is to construct indexes for huge databases that are maintained on external memory—for example, in disk files Although the underlying algorithms that we have discussed are powerful, developing a file-system implementation based on B trees or on extendible hashing is a complex task First, we cannot use the Java programs in this section directly—they have to be modified to read and refer to disk files Second, we have to be sure that the algorithm parameters (page and directory size, for example) are tuned properly to the characteristics of the particular hardware that we are using Third, we have to pay attention to reliability, error detection, and error correction For example, we need to be able to check that the data structure is in a consistent state and to consider how we might proceed to correct any of the scores of errors that might crop up Systems considerations of this kind are critical—and are beyond the scope of this book On the other hand, if we have a programming system that supports virtual memory, we can put to direct use the Java implementations that we have considered here in a situation where we have a huge number of symbol-table operations to perform on a huge table Roughly, each time that we access a page, such a system will put that page in a cache, where references to data on that page are handled efficiently If we refer to a page that is not in the cache, the system has to read the page from external memory, so we can think of cache misses as roughly equivalent to the probe cost measure that we have been using For B trees, every search or insertion references the root, so the root will always be in the cache Otherwise, for sufficiently large M, typical searches and insertions involve at most two cache misses For a large cache, there is a good chance that the first page (the child of the root) that is accessed on a search is already in the cache, so the average cost per search is likely to be significantly less than two probes For extendible hashing, it is unlikely that the whole directory will be in the cache, so we expect that both the directory access and the page access might involve a cache miss (this case is the worst case) That is, two probes are required for a search in a huge table—one to access the appropriate part of the directory and one to access the appropriate page These algorithms form an appropriate subject on which to close our discussion of searching, because, to use them effectively, we need to understand basic properties of binary search, BSTs, balanced trees, hashing, and tries—the basic searching algorithms that we have studied in Chapters 12 through 15 As a group, these algorithms provide us with solutions to the symbol-table implementation problem in a broad variety of situations: they constitute an outstanding example of the power of algorithmic technology Exercises 16.46 Develop a symbol-table implementation using B trees that includes a clone implementation and supports the construct, count, search, insert, re-move, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7) 16.47 Develop a symbol-table implementation using extendible hashing that includes a clone implementation and supports the construct, count, search, insert, remove, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7) 16.48 Modify the B-tree implementation in Section 16.3 (Programs 16.1 through 16.3) to use an ADT for page references 16.49 Modify the extendible-hashing implementation in Section 16.4 (Programs 16.5 through 16.8) to use an ADT for page references 16.50 Estimate the average number of probes per search in a B tree for S random searches, in a typical cache system, where the T most-recently-accessed pages are kept in memory (and therefore add 0 to the probe count) Assume that S is much larger than T 16.51 Estimate the average number of probes per search in an extendible hash table, for the cache model described in Exercise 16.50 16.52 If your system supports virtual memory, design and conduct experiments to compare the performance of B trees with that of binary search, for random searches in a huge symbol table 16.53 Implement a priority-queue ADT that supports construct for a huge number of items, followed by a huge number of insert and remove the maximum operations (see Chapter 9) 16.54 Develop an external symbol-table ADT based on a skip-list representation of B trees (see Exercise 13.80) 16.55 If your system supports virtual memory, run experiments to determine the value of M that leads to the fastest search times for a B-tree implementation supporting random search operations in a huge symbol table (It may be worthwhile for you to learn basic properties of your system before conducting such experiments, which can be costly.) 16.56 Modify the B-tree implementation in Section 16.3 (Programs 16.1 through 16.03) to operate in an environment where the table resides on external storage If your system allows nonsequential file access, put the whole table on a single (huge) file and use offsets within the file in place of references in the data structure If your system allows you to access pages on external devices directly, use page addresses in place of references in the data structure If your system allows both, choose the approach that you determine to be most reasonable for implementing a huge symbol table 16.57 Modify the extendible-hashing implementation in Section 16.4 (Programs 16.5 through 16.8) to operate in an environment where the table resides on external storage Explain the reasons for the approach that you choose for allocating the directory and the pages to files (see Exercise 16.56) Top References for Part Four The primary references for this section are the books by Knuth; Baeza-Yates and Gonnet; Mehlhorn; and Cormen, Leiserson, and Rivest Many of the algorithms covered here are treated in great detail in these books, with mathematical analyses and suggestions for practical applications Classical methods are covered thoroughly in Knuth; the more recent methods are described in the other books, with further references to the literature These four sources, and the Sedgewick-Flajolet book, describe nearly all the "beyond the scope of this book" material referred to in this section The material in Chapter 13 comes from the 1996 paper by Roura and Martinez, the 1985 paper by Sleator and Tarjan, and the 1978 paper by Guibas and Sedgewick As suggested by the dates of these papers, balanced trees are the subject of ongoing research The books cited above have detailed proofs of properties of red–black trees and similar structures and references to more recent work The treatment of tries in Chapter 15 is classical (though complete implementations are rarely found in the literature) The material on TSTs comes from the 1997 paper by Bentley and Sedgewick The 1972 paper by Bayer and McCreight introduced B trees, and the extendible hashing algorithm presented in Chapter 16 comes from the 1979 paper by Fagin, Nievergelt, Pippenger, and Strong Analytic results on extendible hashing were derived by Flajolet in 1983 These papers are must reading for anyone wishing further information on external searching methods Practical applications of these methods arise within the context of database systems An introduction to this field is given, for example, in the book by Date R Baeza-Yates and G H Gonnet, Handbook of Algorithms and Data Structures, second edition, Addison-Wesley, Reading, MA, 1984 R Bayer and E M McCreight, "Organization and maintenance of large ordered indexes," Acta Informatica 1, 1972 J L Bentley and R Sedgewick, "Sorting and searching strings," Eighth Symposium on Discrete Algorithms, New Orleans, January, 1997 T H Cormen, C E Leiserson, and R L Rivest, Introduction to Algorithms, second edition, MIT Press/McGraw-Hill, Cambridge, MA, 2002 C J Date, An Introduction to Database Systems, seventh edition, Addison-Wesley, Boston, MA, 2002 R Fagin, J Nievergelt, N Pippenger and H R Strong, "Extendible hashing—a fast access method for dynamic files," ACM Transactions on Database Systems 4, 1979 P Flajolet, "On the performance analysis of extendible hashing and trie search," Acta Informatica 20, 1983 L Guibas and R Sedgewick, "A dichromatic framework for balanced trees," in 19th Annual Symposium on Foundations of Computer Science, IEEE, 1978 Also in A Decade of Progress 1970–1980, Xerox PARC, Palo Alto, CA D E Knuth, The Art of Computer Programming Volume 3: Sorting and Searching, second edition, Addison-Wesley, Reading, MA, 1997 K Mehlhorn, Data Structures and Algorithms 1: Sorting and Searching, Springer-Verlag, Berlin, 1984 S Roura and C Martinez, "Randomization of search trees by subtree size," Fourth European Symposium on Algorithms, Barcelona, September, 1996 R Sedgewick and P Flajolet, An Introduction to the Analysis of Algorithms, Addison-Wesley, Reading, MA, 1996 D Sleator and R E Tarjan, "Self-adjusting binary search trees," Journal of the ACM 32, 1985 Top Appendix For simplicity and flexibility, we have used input and output sparingly in the programs in this book Most of our programs are ADT implementations intended for use with diverse clients, but we also exhibit the driver programs that you need to run and test the programs on your own data Programs 1.1, 6.1, and 12.6 are typical examples In these drivers: We use the command line to get values of parameters We take input data from the standard input stream We print out results on the standard output stream The conventions for taking parameter values are standard in Java (and the mechanism is described in Section 3.7); in this Appendix, we present the classes that we use for input and output Rather than directly using Java library classes for input and output in our code, we use the adapter classes In and Out The code for Out is trivial, since the methods that we use are precisely the methods that print a string (perhaps followed by a newline character) from Java's System.out class: public class Out { public static void print(String s) { System.out.print(s); } public static void println(String s) { System.out.println(s); } } To use the programs in this book that use the Out class, either put this code in a file named Out.java or just replace "Out"with "System.out" in the program code The System.out class overloads print and println to take primitive-type parameters; we not bother to so because our client code usually prints strings and can otherwise easily use type conversion The code for In is more complicated because we have to arrange for reading different types of data (Type conversion for output is straightforward because of Java's convention that every type of data have a toString method that converts it to a string.) The following implementation of In is an adapter class for the Java StreamTokenizer class It defines methods to initialize itself; read integers, floating point numbers, and strings; and test whether the input stream is empty: import java.io.*; public class In { private static int c; private static boolean blank() { return Character.isWhitespace((char) c); } private static void readC() { try { c = System.in.read(); } catch (IOException e) { c = -1; } } public static void init() { readC(); } public static boolean empty() { return c == -1; } public static String getString() { if (empty()) return null; String s = ""; { s += (char) c; readC(); } while (!(empty() | blank()));| while (!empty() && blank()) readC(); return s; } public static int getInt() { return Integer.parseInt(getString()); } public static double getDouble() { return Double.parseDouble(getString()); } } To use the programs in this book that use the In class, put this code in a file named In.java Our driver programs are intended for our own use in running and testing algorithms on known test data Accordingly, we normally know that a program's input data is in the format that it expects (because we construct both the program and data in such a fashion), so we do not include error checking in In Our programs explicitly initialize the input stream by calling In.init and test whether it is empty by calling In.empty, often just using the construct for( In.init(); !In.empty(); ) with calls to one or more of the get methods within the body of the loop The get methods return 0 or null rather than raising an exception if we attempt to read from an empty input stream, but our drivers do not make use of those return values Using these adapter classes gives us the flexibility to change the way that we do input and output without modifying the code in the book at all While the implementations given here are useful in most Java programming environments, other implementations of In and Out might be called for in various special situations If you have classes for input and output that you are accustomed to using, it will be a simple matter for you to implement appropriate In and Out adapter classes to replace the ones given here Or, as you gain experience developing and testing the algorithmsin this book, writing the more sophisticated drivers called for in many of the exercises, you may develop more sophisticated implementations of these classes for your own use Top Exercises A.1 Write a version of In that prints an informative message about what is wrong with the input for each possible exception that might arise A.2 Write a class that extends Applet and provide implementations of In and Out so that you can accept input and provide output in an applet window Your goal is to make it possible for driver code in programs such as Program 1.1 to be used with as few changes as possible Top ... Section 10 .7 Sublinear-Time Sorts Chapter 11 Special-Purpose Sorting Methods Section 11 .1 Batcher's Odd–Even Mergesort Section 11 .2 Sorting Networks Section 11 .3 Sorting In Place Section 11 .4 External Sorting... Section 14 .2 Separate Chaining Section 14 .3 Linear Probing Section 14 .4 Double Hashing Section 14 .5 Dynamic Hash Tables Section 14 .6 Perspective Chapter 15 Radix Search Section 15 .1 Digital Search Trees... Library of Congress Cataloging -in- Publication Data Sedgewick, Robert, 19 46 – Algorithms in Java / Robert Sedgewick — 3d ed p cm Includes bibliographical references and index Contents: v 1, pts 1 4 Fundamentals,