CuuDuongThanCong.com Algorithms FOURTH EDITION PART I CuuDuongThanCong.com This page intentionally left blank CuuDuongThanCong.com Algorithms FOURTH EDITION PART I Robert Sedgewick and Kevin Wayne Princeton University Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City CuuDuongThanCong.com Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at (800) 382-3419 or corpsales@pearsoned.com For government sales inquiries, please contact governmentsales@pearsoned.com For questions about sales outside the United States, please contact international@pearsoned.com Visit us on the Web: informit.com/aw Copyright © 2014 Pearson Education, Inc All rights reserved Printed in the United States of America This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236-3290 ISBN-13: 978-0-13-379869-2 ISBN-10: 0-13-379869-0 First digital release, February 2014 CuuDuongThanCong.com To Adam, Andrew, Brett, Robbie and especially Linda _ To Jackie and Alex _ CuuDuongThanCong.com CONTENTS Note: This is an online edition of Chapters through of Algorithms, Fourth Edition, which contains the content covered in our online course Algorithms, Part I Preface ix Fundamentals 1.1 Basic Programming Model Primitive data types • Loops and conditionals • Arrays • Static methods • Recursion • APIs • Strings • Input and output • Binary search 1.2 Data Abstraction 64 Objects • Abstract data types • Implementing ADTs • Designing ADTs 1.3 Bags, Queues, and Stacks 120 APIs • Arithmetic expression evaluation • Resizing arrays • Generics • Iterators • Linked lists 1.4 Analysis of Algorithms 172 Running time • Computational experiments • Tilde notation • Order-ofgrowth classifications • Amortized analysis • Memory usage 1.5 Case Study: Union-Find Dynamic connectivity • Quick find • Quick union • Weighted quick union CuuDuongThanCong.com 216 Sorting 243 2.1 Elementary Sorts 244 Rules of the game • Selection sort • Insertion sort • Shellsort 2.2 Mergesort 270 Abstract in-place merge • Top-down mergesort • Bottom-up mergesort • N lg N lower bound for sorting 2.3 Quicksort 288 In-place partitioning • Randomized quicksort • 3-way partitioning 2.4 Priority Queues 308 Priority queue API • Elementary implementations • Binary heap • Heapsort 2.5 Applications 336 Comparators • Stability • Median and order statistics Searching 361 3.1 Symbol Tables 362 Symbol table API • Ordered symbol table API • Dedup • Frequency counter • Sequential search • Binary search 3.2 Binary Search Trees 396 Basic implementation • Order-based methods • Deletion 3.3 Balanced Search Trees 424 2-3 search trees • Red-black BSTs • Deletion 3.4 Hash Tables 458 Hash functions • Separate chaining • Linear probing 3.5 Applications 486 Set data type • Whitelist and blacklist filters • Dictionary lookup • Inverted index • File indexing • Sparse matrix-vector multiplication Chapters through 6, which correspond to our online course Algorithms, Part II, are available as Algorithms, Fourth Edition, Part II For more information, see http://algs4.cs.princeton.edu CuuDuongThanCong.com This page intentionally left blank CuuDuongThanCong.com PREFACE T his book is intended to survey the most important computer algorithms in use today, and to teach fundamental techniques to the growing number of people in need of knowing them It is intended for use as a textbook for a second course in computer science, after students have acquired basic programming skills and familiarity with computer systems The book also may be useful for self-study or as a reference for people engaged in the development of computer systems or applications programs, since it contains implementations of useful algorithms and detailed information on performance characteristics and clients The broad perspective taken makes the book an appropriate introduction to the field is fundamental to any computer-science curriculum, but it is not just for programmers and computer-science students Everyone who uses a computer wants it to run faster or to solve larger problems The algorithms in this book represent a body of knowledge developed over the last 50 years that has become indispensable From N-body simulation problems in physics to genetic-sequencing problems in molecular biology, the basic methods described here have become essential in scientific research; from architectural modeling systems to aircraft simulation, they have become essential tools in engineering; and from database systems to internet search engines, they have become essential parts of modern software systems And these are but a few examples—as the scope of computer applications continues to grow, so grows the impact of the basic methods covered here In Chapter 1, we develop our fundamental approach to studying algorithms, including coverage of data types for stacks, queues, and other low-level abstractions that we use throughout the book In Chapters and 3, we survey fundamental algorithms for sorting and searching; and in Chapters and 5, we cover algorithms for processing graphs and strings Chapter is an overview placing the rest of the material in the book in a larger context the study of algorithms and data structures CuuDuongThanCong.com ix 504 Chapter n Searching array of double[]objects 0.0 a 0 1 0.0 0.0 45 36 0.0 90 0.0 0.0 0.0 2 90 0.0 array of SparseVector objects 0.0 0.0 45 0.0 36 90 0.0 0.0 st 0.0 0.0 0.0 value key 18 90 a st 36 90 90 45 36 18 st independent symbol-table objects st 0.0 st 45 a[4][2] Sparse matrix representations code a[i][j] to refer to the element in row i and column j, we use a[i].put(j, val) to set a value in the matrix and a[i].get(j) to retrieve a value As you can see from the code below, matrix-vector multiplication using this class is even simpler than with the array representation (and it more clearly describes the computation) More important, it only requires time proportional to N plus the number of nonzero elements in the matrix For small matrices or matrices that are not sparse, the overhead for maintaining symbol tables can be substantial, but it is worth your while to be sure to understand the ramifications of using symbol tables for huge sparse matrices To fix ideas, consider a huge application (like the one faced by Brin and Page) where N is 10 billion or 100 billion, but the average number of nonzero elements per row is SparseVector[] a; less than 10 For such an application, using syma = new SparseVector[N]; double[] x = new double[N]; bol tables speeds up matrix-vector multiplication by double[] b = new double[N]; a factor of a billion or more The elementary na ture of this application should not detract from // Initialize a[] and x[] its importance: programmers who not take for (int i = 0; i < N; i++) advantage of the potential to save time and space b[i] = a[i].dot(x); in this way severely limit their potential to solve practical problems, while programmers who Sparse matrix-vector multiplication CuuDuongThanCong.com 3.5 n Applications take factor-of-a-billion speedups when they are available are likely to be able to address problems that could not otherwise be contemplated Building the matrix for the Google application is a graph-processing application (and a symbol-table client!), albeit for a huge sparse matrix Given the matrix, the PageRank calculation is nothing more than doing a matrix-vector multiplication, replacing the source vector with the result vector, and iterating the process until it converges (as guaranteed by fundamental theorems in probability theory) Thus, the use of a class like SparseVector can improve the time and space usage for this application by a factor of 10 billion or 100 billion or more Similar savings are possible in many scientific calculations, so sparse vectors and matrices are widely used and typically incorporated into specialized systems for scientific computing When working with huge vectors and matrices, it is wise to run simple performance tests to be sure that the kinds of performance gains that we have illustrated here are not being missed On the other hand, array processing for primitive types of data is built into most programming languages, so using arrays for vectors that are not sparse, as we did in this example, may offer further speedups Developing a good understanding of the underlying costs and making the appropriate implementation decisions is certainly worthwhile for such applications Symbol tables are a primary contribution of algorithmic technology to the development of our modern computational infrastructure because of their ability to deliver savings on a huge scale in a vast array of practical applications, making the difference between providing solutions to a wide range of problems and not being able to address them at all Few fields of science or engineering involve studying the effects of an invention that improves costs by factors of 100 billion—symbol-table applications put us in just that position, as we have just seen in several examples, and these improvements have had profound effects The data structures and algorithms that we have considered are certainly not the final word: they were all developed in just a few decades, and their properties are not fully understood Because of their importance, symbol-table implementations continue to be studied intensely by researchers around the world, and we can look forward to new developments on many fronts as the scale and scope of the applications they address continue to expand CuuDuongThanCong.com 505 506 Chapter n Searching Q&A Q Can a SET contain null? A No As with symbol tables, keys are non-null objects Q Can a SET be null? A No A SET can be empty (contain no objects), but not null As with any Java data type, a variable of type SET can have the value null, but that just indicates that it does not reference any SET The result of using new to create a SET is always an object that is not null Q If all my data is in memory, there is no real reason to use a filter, right? A Right Filtering really shines in the case when you have no idea how much data to expect Otherwise, it may be a useful way of thinking, but not a cure-all Q I have data in a spreadsheet Can I develop something like LookupCSV to search through it? A Your spreadsheet application probably has an option to export to a csv file, so you can use LookupCSV directly Q Why would I need FileIndex? Doesn’t my operating system solve this problem? A If you are using an OS that meets your needs, continue to so, by all means As with many of our programs, FileIndex is intended to show you the basic underlying mechanisms of such applications and to suggest possibilities to you Q Why not have the dot() method in SparseVector take a SparseVector object as argument and return a SparseVector object? A That is a fine alternate design and a nice programming exercise that requires code that is a bit more intricate than for our design (see Exercise 3.5.16) For general matrix processing, it might be worthwhile to also add a SparseMatrix type CuuDuongThanCong.com 3.5 n Applications ExErcisEs 3.5.1 Implement SET and HashSET as “wrapper class” clients of ST and HashST, respectively (provide dummy values and ignore them) 3.5.2 Develop a SET implementation SequentialSearchSET by starting with the code for SequentialSearchST and eliminating all of the code involving values 3.5.3 Develop a SET implementation BinarySearchSET by starting with the code for BinarySearchST and eliminating all of the code involving values 3.5.4 Develop classes HashSTint and HashSTdouble for maintaining sets of keys of primitive int and double types, respectively (Convert generics to primitive types in the code of LinearProbingHashST.) 3.5.5 Develop classes STint and STdouble for maintaining ordered symbol tables where keys are primitive int and double types, respectively (Convert generics to primitive types in the code of RedBlackBST.) Test your solution with a version of SparseVector as a client 3.5.6 Develop classes HashSETint and HashSETdouble for maintaining sets of keys of primitive int and double types, respectively (Eliminate code involving values in your solution to Exercise 3.5.4.) 3.5.7 Develop classes SETint and SETdouble for maintaining ordered sets of keys of primitive int and double types, respectively (Eliminate code involving values in your solution to Exercise 3.5.5.) 3.5.8 Modify LinearProbingHashST to keep duplicate keys in the table Return any value associated with the given key for get(), and remove all items in the table that have keys equal to the given key for delete() 3.5.9 Modify BST to keep duplicate keys in the tree Return any value associated with the given key for get(), and remove all nodes in the tree that have keys equal to the given key for delete() 3.5.10 Modify RedBlackBST to keep duplicate keys in the tree Return any value associated with the given key for get(), and remove all nodes in the tree that have keys equal to the given key for delete() CuuDuongThanCong.com 507 508 Chapter n Searching ExErcisEs (continued) 3.5.11 Develop a MultiSET class that is like SET, but allows equal keys and thus implements a mathematical multiset 3.5.12 Modify LookupCSV to associate with each key all values that appear in a keyvalue pair with that key in the input (not just the most recent, as in the associative-array abstraction) 3.5.13 Modify LookupCSV to make a program RangeLookupCSV that takes two key values from the standard input and prints all key-value pairs in the csv file such that the key falls within the range specified 3.5.14 Develop and test a static method invert() that takes as argument an ST and produces as return value the inverse of the given symbol table (a symbol table of the same type) 3.5.15 Write a program that takes a string on standard input and an integer k as command-line argument and puts on standard output a sorted list of the k-grams (substrings of length k) found in the string, each followed by its index in the string 3.5.16 Add a method sum() to SparseVector that takes a SparseVector as argument and returns a SparseVector that is the term-by-term sum of this vector and the argument vector Note: You need delete() (and special attention to precision) to handle the case where an entry becomes CuuDuongThanCong.com 3.5 n Applications crEAtivE problEms 3.5.17 Finite mathematical sets Your goal is to develop an implementation of the following API for processing finite mathematical sets: public class MathSET MathSET(Key[] universe) void add(Key key) MathSET complement() create the empty set (using given universe) put key into the set set of keys in the universe that are not in this set void union(MathSET a) put any keys from a into the set that are not already there void intersection(MathSET a) remove any keys from this set that are not in a void delete(Key key) remove key from the set boolean contains(Key key) is key in the set? boolean isEmpty() is the set empty? number of keys in the set int size() apI for a basic finite set data type Multisets After referring to Exercises 3.5.2 and 3.5.3 and the previous exercise, develop APIs MultiHashSET and MultiSET for multisets (sets that can have equal keys) and implementations SeparateChainingMultiSET and BinarySearchMultiSET for multisets and ordered multisets, respectively 3.5.19 Equal keys in symbol tables Consider the API MultiST (unordered or ordered) to be the same as our symbol-table APIs defined on page 363 and page 366, but with equal keys allowed, so that the semantics of get() is to return any value associated with the given key, and we add a new method Iterable getAll(Key key) CuuDuongThanCong.com 509 510 Chapter n Searching crEAtivE problEms (continued) that returns all values associated with the given key Using our code for SeparateChainingHashST and BinarySearchST as a starting point, develop implementations BinarySearchMultiST and SeparateChainingMultiST for these APIs 3.5.20 Concordance Write an ST client Concordance that puts on standard output a concordance of the strings in the standard input stream (see page 498) 3.5.21 Inverted concordance Write a program InvertedConcordance that takes a concordance on standard input and puts the original string on standard output stream Note : This computation is associated with a famous story having to with the Dead Sea Scrolls The team that discovered the original tablets enforced a secrecy rule that essentially resulted in their making public only a concordance After a while, other researchers figured out how to invert the concordance, and the full text was eventually made public 3.5.22 Fully indexed CSV Implement an ST client FullLookupCSV that builds an array of ST objects (one for each field), with a test client that allows the user to specify the key and value fields in each query 3.5.23 Sparse matrices Develop an API and an implementation for sparse 2D matrices Support matrix addition and matrix multiplication Include constructors for row and column vectors 3.5.24 Non-overlapping interval search Given a list of non-overlapping intervals of items, write a function that takes an item as argument and determines in which, if any, interval that item lies For example, if the items are integers and the intervals are 1643-2033, 5532-7643, 8999-10332, 5666653-5669321, then the query point 9122 lies in the third interval and 8122 lies in no interval 3.5.25 Registrar scheduling The registrar at a prominent northeastern University recently scheduled an instructor to teach two different classes at the same exact time Help the registrar prevent future mistakes by describing a method to check for such conflicts For simplicity, assume all classes run for 50 minutes starting at 9:00, 10:00, 11:00, 1:00, 2:00, or 3:00 3.5.26 LRU cache Create a data structure that supports the following operations: access and remove The access operation inserts the item onto the data structure if it’s not already present The remove operation deletes and returns the item that was least CuuDuongThanCong.com 3.5 n Applications recently accessed Hint : Maintain the items in order of access in a doubly linked list, along with pointers to the first and last nodes Use a symbol table with keys = items, values = location in linked list When you access an element, delete it from the linked list and reinsert it at the beginning When you remove an element, delete it from the end and remove it from the symbol table 3.5.27 List Develop an implementation of the following API: public class List implements Iterable create a list List() void addFront(Item item) add item to the front void addBack(Item item) add item to the back Item deleteFront() remove from the front Item deleteBack() remove from the back void delete(Item item) remove item from the list void add(int i, Item item) add item as the ith in the list Item delete(int i) remove the ith item from the list boolean contains(Item item) is item in the list? boolean isEmpty() is the list empty? number of items in the list int size() apI for a list data type Hint : Use two symbol tables, one to find the ith item in the list efficiently, and the other to efficiently search by item (Java’s java.util.List interface contains methods like these but does not supply any implementation that efficiently supports all operations.) 3.5.28 UniQueue Create a data type that is a queue, except that an element may only be inserted the queue once Use an existence symbol table to keep track of all elements that have ever been inserted and ignore requests to re-insert such items CuuDuongThanCong.com 511 512 Chapter n Searching crEAtivE problEms (continued) 3.5.29 Symbol table with random access Create a data type that supports inserting a key-value pair, searching for a key and returning the associated value, and deleting and returning a random key Hint : Combine a symbol table and a randomized queue (see Exercise 1.3.35) CuuDuongThanCong.com 3.5 n Applications ExpErimENts 3.5.30 Duplicates (revisited) Redo Exercise 2.5.31 using the Dedup filter given on page 490 Compare the running times of the two approaches Then use Dedup to run the experiments for N = 10 7, 10 8, and10 9, repeat the experiments for random long values and discuss the results 3.5.31 Spell checker With the file dictionary.txt from the booksite as commandline argument, the BlackFilter client described onpage 491 prints all misspelled words in a text file taken from standard input Compare the performance of RedBlackBST, SeparateChainingHashST, and LinearProbingHashST for the file WarAndPeace.txt (available on the booksite) with this client and discuss the results 3.5.32 Dictionary Study the performance of a client like LookupCSV in a scenario where performance matters Specifically, design a query-generation scenario instead of taking commands from standard input, and run performance tests for large inputs and large numbers of queries 3.5.33 Indexing Study a client like LookupIndex in a scenario where performance matters Specifically, design a query-generation scenario instead of taking commands from standard input, and run performance tests for large inputs and large numbers of queries 3.5.34 Sparse vector Run experiments to compare the performance of matrix-vector multiplication using SparseVector to the standard implementation using arrays 3.5.35 Primitive types Evaluate the utility of using primitive types for Integer and Double values, for LinearProbingHashST and RedBlackBST How much space and time are saved, for large numbers of searches in large tables? CuuDuongThanCong.com 513 This page intentionally left blank CuuDuongThanCong.com Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick/Kevin Wayne 2008 • 736 pp • ISBN: 0-321-49805-4 © Introduction to Programming in Java takes an inter- disciplinary approach to teaching programming with the Java programming language Features This book thoroughly covers the field and is ideal for introductory programming courses It can also be used for courses that integrate programming with mathematics, science, or engineering Students learn basic computer science concepts in the context of interesting applications in science, engineering, and commercial computing, leveraging familiar science and math while preparing students to use computers effectively in later courses This serves to demonstrate that computation is not merely a tool, but an integral part of the modern world that pervades scientific inquiry and commercial development The book takes an “objects in the middle” approach where students learn basic control structures and functions, then how to use, create, and design classes A full programming model includes standard libraries for input, graphics, sound, and image processing that students can immediately put to use An integrated Companion Website features extensive Java coding examples, additional exercises, and Web links Instructors, contact your Pearson representative to receive an exam copy, or email PearsonEd.CS@Pearson.com CuuDuongThanCong.com THIS PRODUCT informit.com/register Register the Addison-Wesley, Exam Cram, Prentice Hall, Que, and Sams products you own to unlock great benefits To begin the registration process, simply go to informit.com/register to sign in or create an account You will then be prompted to enter the 10- or 13-digit ISBN that appears on the back cover of your product About InformIT Registering your products can unlock the following benefits: • Access to supplemental content, including bonus chapters, source code, or project files • A coupon to be used on your next purchase Registration benefits vary by product Benefits will be listed on your Account page under Registered Products — THE TRUSTED TECHNOLOGY LEARNING SOURCE INFORMIT IS HOME TO THE LEADING TECHNOLOGY PUBLISHING IMPRINTS Addison-Wesley Professional, Cisco Press, Exam Cram, IBM Press, Prentice Hall Professional, Que, and Sams Here you will gain access to quality and trusted content and resources from the authors, creators, innovators, and leaders of technology Whether you’re looking for a book on a new technology, a helpful article, timely newsletters, or access to the Safari Books Online digital library, InformIT has a solution for you informIT.com Addison-Wesley | Cisco Press | Exam Cram IBM Press | Que | Prentice Hall | Sams THE TRUSTED TECHNOLOGY LEARNING SOURCE SAFARI BOOKS ONLINE CuuDuongThanCong.com informIT.com THE TRUSTED TECHNOLOGY LEARNING SOURCE InformIT is a brand of Pearson and the online presence for the world’s leading technology publishers It’s your source for reliable and qualified content and knowledge, providing access to the top brands, authors, and contributors from the tech community LearnIT at InformIT Looking for a book, eBook, or training video on a new technology? Seeking timely and relevant information and tutorials? Looking for expert opinions, advice, and tips? InformIT has the solution • Learn about new releases and special promotions by subscribing to a wide variety of newsletters Visit informit.com /newsletters • Access FREE podcasts from experts at informit.com /podcasts • Read the latest author articles and sample chapters at informit.com /articles • Access thousands of books and videos in the Safari Books Online digital library at safari.informit.com • Get tips from expert blogs at informit.com /blogs Visit informit.com /learn to discover all the ways you can access the hottest technology content Are You Part of the IT Crowd? Connect with Pearson authors and editors via RSS feeds, Facebook, Twitter, YouTube, and more! Visit informit.com /socialconnect informIT.com CuuDuongThanCong.com THE TRUSTED TECHNOLOGY LEARNING SOURCE Try Safari Books Online FREE Get online access to 5,000+ Books and Videos FREE TRIAL—GET STARTED TODAY! www.informit.com/safaritrial Find trusted answers, fast Only Safari lets you search across thousands of best-selling books from the top technology publishers, including Addison-Wesley Professional, Cisco Press, O’Reilly, Prentice Hall, Que, and Sams Master the latest tools and techniques In addition to gaining access to an incredible inventory of technical books, Safari’s extensive collection of video tutorials lets you learn from the leading video training experts WAIT, THERE’S MORE! Keep your competitive edge With Rough Cuts, get access to the developing manuscript and be among the first to learn the newest technologies Stay current with emerging technologies Short Cuts and Quick Reference Sheets are short, concise, focused content created to get you up-to-speed quickly on new and cutting-edge technologies CuuDuongThanCong.com ... Saddle River, New Jersey 07458, or you may fax your request to (201) 23 6-3 290 ISBN-13: 97 8-0 -1 3-3 7986 9-2 ISBN-10: 0-1 3-3 7986 9-0 First digital release, February 2014 CuuDuongThanCong.com ... [any identifier] names a data-type value operator + - * / names a data-type operation literal int double boolean char -4 2 1.0e-15 3.14 true false 'a' '+' '9' ' ' source-code representation of a... Mergesort 270 Abstract in-place merge • Top-down mergesort • Bottom-up mergesort • N lg N lower bound for sorting 2.3 Quicksort 288 In-place partitioning • Randomized quicksort • 3-way partitioning