A ProgrAmmer’s ComPAnion to Algorithm AnAlysis © 2007 by Taylor & Francis Group, LLC C6730_C000a.indd 08/14/2006 3:53:08 PM A ProgrAmmer’s ComPAnion to Algorithm AnAlysis ernst l leiss University of Houston, Texas, U.S.A © 2007 by Taylor & Francis Group, LLC C6730_C000a.indd 08/14/2006 3:53:08 PM Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Printed in the United States of America on acid-free paper 10 International Standard Book Number-10: 1-58488-673-0 (Softcover) International Standard Book Number-13: 978-1-58488-673-0 (Softcover) This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Library of Congress Cataloging-in-Publication Data Leiss, Ernst L., 1952A programmer’s companion to algorithm analysis / Ernst L Leiss p cm Includes bibliographical references and index ISBN 1-58488-673-0 (acid-free paper) Programming (Mathematics) Algorithms Data processing I Title QA402.5.L398 2006 005.1 dc22 2006044552 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2007 by Taylor & Francis Group, LLC T&F_LOC_A_Master.indd C6730_C000a.indd 08/14/2006 6/14/06 8:56:36 3:53:08 AM PM C6730_C000.fm Page v Monday, July 3, 2006 2:30 PM Preface The primary emphasis of this book is the transition from an algorithm to a program Given a problem to solve, the typical first step is the design of an algorithm; this algorithm is then translated into software We will look carefully at the interface between the design and analysis of algorithms on the one hand and the resulting program solving the problem on the other This approach is motivated by the fact that algorithms for standard problems are readily available in textbooks and literature and are frequently used as building blocks for more complex designs Thus, the correctness of the algorithm is much less a concern than its adaptation to a working program Many textbooks, several excellent, are dedicated to algorithms, their design, their analysis, the techniques involved in creating them, and how to determine their time and space complexities They provide the building blocks of the overall design These books are usually considered part of the theoretical side of computing There are also numerous books dedicated to designing software, from those concentrating on programming in the small (designing and debugging individual programs) to programming in the large (looking at large systems in their totality) These books are usually viewed as belonging to software engineering However, there are no books that look systematically at the gap separating the theory of algorithms and software engineering, even though many things can go wrong in taking several algorithms and producing a software product derived from them This book is intended to fill this gap It is not intended to teach algorithms from scratch; indeed, I assume the reader has already been exposed to the ordinary machinery of algorithm design, including the standard algorithms for sorting and searching and techniques for analyzing the correctness and complexity of algorithms (although the most important ones will be reviewed) Nor is this book meant to teach software design; I assume that the reader has already gained experience in designing reasonably complex software systems Ideally, the readers’ interest in this book’s topic was prompted by the uncomfortable realization that the path from algorithm to software was much more arduous than anticipated, and, indeed, results obtained on the theory side of the development process, be they results derived by readers or acquired from textbooks, did not translate satisfactorily to corresponding results, that is, performance, for the developed software Even if the reader has never encountered a situation where the performance predicted by the complexity analysis of a specific algorithm did not correspond to the performance observed by running the resulting software, I argue that such occurrences are increasingly more likely, given © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page vi Monday, July 3, 2006 2:30 PM the overall development of our emerging hardware platforms and software environments In many cases, the problems I will address are rooted in the different way memory is viewed For the designer of an algorithm, memory is inexhaustible, has uniform access properties, and generally behaves nicely (I will be more specific later about the meaning of niceness) Programmers, however, have to deal with memory hierarchies, limits on the availability of each class of memory, and the distinct nonuniformity of access characteristics, all of which imply a definite absence of niceness Additionally, algorithm designers assume to have complete control over their memory, while software designers must deal with several agents that are placed between them and the actual memory — to mention the most important ones, compilers and operating systems, each of which has its own idiosyncrasies All of these conspire against the software designer who has the naïve and often seriously disappointed expectation that properties of algorithms easily translate into properties of programs The book is intended for software developers with some exposure to the design and analysis of algorithms and data structures The emphasis is clearly on practical issues, but the book is naturally dependent on some knowledge of standard algorithms — hence the notion that it is a companion book It can be used either in conjunction with a standard algorithm text, in which case it would most likely be within the context of a course setting, or it can be used for independent study, presumably by practitioners of the software development process who have suffered disappointments in applying the theory of algorithms to the production of efficient software © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page vii Monday, July 3, 2006 2:30 PM Contents Foreword xiii Part The Algorithm Side: Regularity, Predictability, and Asymptotics A Taxonomy of Algorithmic Complexity Introduction The Time and Space Complexities of an Algorithm The Worst-, Average-, and Best-Case Complexities of an Algorithm 1.3.1 Scenario 11 1.3.2 Scenario 12 1.4 Bit versus Word Complexity 12 1.5 Parallel Complexity .15 1.6 I/O Complexity 17 1.6.1 Scenario 18 1.6.2 Scenario 20 1.7 On-Line versus Off-Line Algorithms .22 1.8 Amortized Analysis 24 1.9 Lower Bounds and Their Significance .24 1.10 Conclusion 30 Bibliographical Notes 30 Exercises 31 1.1 1.2 1.3 Fundamental Assumptions Underlying Algorithmic Complexity 37 2.1 Introduction 37 2.2 Assumptions Inherent in the Determination of Statement Counts .38 2.3 All Mathematical Identities Hold .44 2.4 Revisiting the Asymptotic Nature of Complexity Analysis 45 2.5 Conclusion 46 Bibliographical Notes 47 Exercises 47 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page viii Monday, July 3, 2006 2:30 PM Examples of Complexity Analysis 49 General Techniques for Determining Complexity 49 Selected Examples: Determining the Complexity of Standard Algorithms 53 3.2.1 Multiplying Two m-Bit Numbers 54 3.2.2 Multiplying Two Square Matrices 55 3.2.3 Optimally Sequencing Matrix Multiplications 57 3.2.4 MergeSort 59 3.2.5 QuickSort 60 3.2.6 HeapSort 62 3.2.7 RadixSort 65 3.2.8 Binary Search 67 3.2.9 Finding the Kth Largest Element 68 3.2.10 Search Trees 71 3.2.10.1 Finding an Element in a Search Tree 72 3.2.10.2 Inserting an Element into a Search Tree .73 3.2.10.3 Deleting an Element from a Search Tree 74 3.2.10.4 Traversing a Search Tree 76 3.2.11 AVL Trees 76 3.2.11.1 Finding an Element in an AVL Tree 76 3.2.11.2 Inserting an Element into an AVL Tree .77 3.2.11.3 Deleting an Element from an AVL Tree 83 3.2.12 Hashing 84 3.2.13 Graph Algorithms 87 3.2.13.1 Depth-First Search 88 3.2.13.2 Breadth-First Search 89 3.2.13.3 Dijkstra’s Algorithm 91 3.3 Conclusion 92 Bibliographical Notes 92 Exercises 93 3.1 3.2 Part The Software Side: Disappointments and How to Avoid Them Sources of Disappointments 103 4.1 Incorrect Software 103 4.2 Performance Discrepancies 105 4.3 Unpredictability 109 4.4 Infeasibility and Impossibility 111 4.5 Conclusion 113 Bibliographical Notes 114 Exercises 115 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page ix Monday, July 3, 2006 2:30 PM Implications of Nonuniform Memory for Software 117 5.1 The Influence of Virtual Memory Management 118 5.2 The Case of Caches 123 5.3 Testing and Profiling 124 5.4 What to Do about It 125 Bibliographical Notes .136 Exercises .137 Implications of Compiler and Systems Issues for Software 141 6.1 Introduction 141 6.2 Recursion and Space Complexity 142 6.3 Dynamic Structures and Garbage Collection 145 6.4 Parameter-Passing Mechanisms 150 6.5 Memory Mappings 155 6.6 The Influence of Language Properties 155 6.6.1 Initialization 155 6.6.2 Packed Data Structures 157 6.6.3 Overspecification of Execution Order .158 6.6.4 Avoiding Range Checks .159 6.7 The Influence of Optimization 160 6.7.1 Interference with Specific Statements 160 6.7.2 Lazy Evaluation 161 6.8 Parallel Processes .162 6.9 What to Do about It 163 Bibliographical Notes .164 Exercises .164 Implicit Assumptions 167 Handling Exceptional Situations 167 7.1.1 Exception Handling 168 7.1.2 Initializing Function Calls 169 7.2 Testing for Fundamental Requirements .171 7.3 What to Do about It 174 Bibliographical Notes .174 Exercises .175 7.1 8.1 8.2 8.3 8.4 8.5 Implications of the Finiteness of the Representation of Numbers 177 Bit and Word Complexity Revisited .177 Testing for Equality 180 Mathematical Properties 183 Convergence .185 What to Do about It 186 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page x Monday, July 3, 2006 2:30 PM Bibliographical Notes .186 Exercises .187 Asymptotic Complexities and the Selection of Algorithms 189 9.1 Introduction 189 9.2 The Importance of Hidden Constants 190 9.3 Crossover Points 193 9.4 Practical Considerations for Efficient Software: What Matters and What Does Not .196 Bibliographical Notes .197 Exercises .198 10 Infeasibility and Undecidability: Implications for Software Development 199 10.1 Introduction 199 10.2 Undecidability 201 10.3 Infeasibility 203 10.4 NP-Completeness 207 10.5 Practical Considerations 208 Bibliographical Notes .209 Exercises .210 Part Conclusion Appendix I: Algorithms Every Programmer Should Know 217 Bibliographical Notes .223 Appendix II: Overview of Systems Implicated in Program Analysis 225 II.1 Introduction 225 II.2 The Memory Hierarchy 225 II.3 Virtual Memory Management 227 II.4 Optimizing Compilers 228 II.4.1 Basic Optimizations 229 II.4.2 Data Flow Analysis .229 II.4.3 Interprocedural Optimizations 230 II.4.4 Data Dependence Analysis 230 II.4.5 Code Transformations 231 II.4.6 I/O Issues .231 II.5 Garbage Collection 232 Bibliographical Notes .234 © 2007 by Taylor & Francis Group, LLC C6730_C000.fm Page xi Monday, July 3, 2006 2:30 PM Appendix III: NP-Completeness and Higher Complexity Classes .237 III.1 Introduction 237 III.2 NP-Completeness 237 III.3 Higher Complexity Classes 240 Bibliographical Notes .241 Appendix IV: Review of Undecidability 243 IV.1 Introduction 243 IV.2 The Halting Problem for Turing Machines 243 IV.3 Post’s Correspondence Problem 245 Bibliographical Note 246 Bibliography .247 © 2007 by Taylor & Francis Group, LLC ... quality of the algorithm It may be impossible to take an algorithm that works very well on a particular parallel system and apply it effectively to a different parallel architecture Parallel algorithms... 11, 2006 7:35 AM A Programmer’s Companion to Algorithm Analysis complexity of these algorithms While the literature may contain a complexity analysis of an algorithm, it is our contention that... numerical algorithms (where one typically devotes a good deal of attention to error analysis and related topics), occasionally questions related to the validity of mathematical identities and similar