High Performance Python PRACTICAL PERFORMANT PROGRAMMING FOR HUMANS Micha Gorelick & Ian Ozsvald www.allitebooks.com High Performance Python How can you take advantage of multi-core architectures or clusters? Or build a system that can scale up and down without losing reliability? Experienced Python programmers will learn concrete solutions to these and other issues, along with war stories from companies that use high performance Python for social media analytics, productionized machine learning, and other situations ■■ Get a better grasp of numpy, Cython, and profilers ■■ Learn how Python abstracts the underlying computer architecture ■■ Use profiling to find bottlenecks in CPU time and memory usage ■■ Write efficient programs by choosing appropriate data structures ■■ Speed up matrix and vector computations ■■ Use tools to compile Python down to machine code ■■ Manage multiple I/O and computational operations concurrently ■■ Convert multiprocessing code to run on a local or remote cluster ■■ Solve large problems while using less RAM its popularity “ Despite in academia and industry, Python is often dismissed as too slow for real applications This book sweeps away that misconception with a thorough introduction to strategies for fast and scalable computation with Python ” —Jake VanderPlas University of Washington Micha Gorelick, winner of the Nobel Prize in 2046 for his contributions to time travel, went back to the 2000s to study astrophysics, work on data at bitly, and co-found Fast Forward Labs as resident Mad Scientist, working on issues from machine learning to performant stream algorithms PY THON / PERFORMANCE US $39.99 Twitter: @oreillymedia facebook.com/oreilly High Performance Python PRACTICAL PERFORMANT PROGRAMMING FOR HUMANS Gorelick & Ozsvald Ian Ozsvald is a data scientist and teacher at ModelInsight.io, with over ten years of Python experience He’s taught high performance Python at the PyCon and PyData conferences and has been consulting on data science and high performance computing for years in the UK High Performance Python Your Python code may run correctly, but you need it to run faster By exploring the fundamental theory behind design choices, this practical guide helps you gain a deeper understanding of Python’s implementation You’ll learn how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs CAN $41.99 ISBN: 978-1-449-36159-4 Micha Gorelick & Ian Ozsvald www.allitebooks.com High Performance Python Micha Gorelick and Ian Ozsvald www.allitebooks.com High Performance Python by Micha Gorelick and Ian Ozsvald Copyright © 2014 Micha Gorelick and Ian Ozsvald All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com/) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com Editors: Meghan Blanchette and Rachel Roumeliotis Production Editor: Matthew Hacker Copyeditor: Rachel Head Proofreader: Rachel Monaghan September 2014: Indexer: Wendy Catalano Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Rebecca Demarest First Edition Revision History for the First Edition: 2014-08-21: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449361594 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc High Performance Python, the image of a fer-de-lance, and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-449-36159-4 [LSI] www.allitebooks.com Table of Contents Preface ix Understanding Performant Python The Fundamental Computer System Computing Units Memory Units Communications Layers Putting the Fundamental Elements Together Idealized Computing Versus the Python Virtual Machine So Why Use Python? 10 13 Profiling to Find Bottlenecks 17 Profiling Efficiently Introducing the Julia Set Calculating the Full Julia Set Simple Approaches to Timing—print and a Decorator Simple Timing Using the Unix time Command Using the cProfile Module Using runsnakerun to Visualize cProfile Output Using line_profiler for Line-by-Line Measurements Using memory_profiler to Diagnose Memory Usage Inspecting Objects on the Heap with heapy Using dowser for Live Graphing of Instantiated Variables Using the dis Module to Examine CPython Bytecode Different Approaches, Different Complexity Unit Testing During Optimization to Maintain Correctness No-op @profile Decorator Strategies to Profile Your Code Successfully Wrap-Up 18 19 23 26 29 31 36 37 42 48 50 52 54 56 57 59 60 iii www.allitebooks.com Lists and Tuples 61 A More Efficient Search Lists Versus Tuples Lists as Dynamic Arrays Tuples As Static Arrays Wrap-Up 64 66 67 70 72 Dictionaries and Sets 73 How Do Dictionaries and Sets Work? Inserting and Retrieving Deletion Resizing Hash Functions and Entropy Dictionaries and Namespaces Wrap-Up 77 77 80 81 81 85 88 Iterators and Generators 89 Iterators for Infinite Series Lazy Generator Evaluation Wrap-Up 92 94 98 Matrix and Vector Computation 99 Introduction to the Problem Aren’t Python Lists Good Enough? Problems with Allocating Too Much Memory Fragmentation Understanding perf Making Decisions with perf ’s Output Enter numpy Applying numpy to the Diffusion Problem Memory Allocations and In-Place Operations Selective Optimizations: Finding What Needs to Be Fixed numexpr: Making In-Place Operations Faster and Easier A Cautionary Tale: Verify “Optimizations” (scipy) Wrap-Up 100 105 106 109 111 113 114 117 120 124 127 129 130 Compiling to C 135 What Sort of Speed Gains Are Possible? JIT Versus AOT Compilers Why Does Type Information Help the Code Run Faster? Using a C Compiler Reviewing the Julia Set Example Cython iv | Table of Contents www.allitebooks.com 136 138 138 139 140 140 Compiling a Pure-Python Version Using Cython Cython Annotations to Analyze a Block of Code Adding Some Type Annotations Shed Skin Building an Extension Module The Cost of the Memory Copies Cython and numpy Parallelizing the Solution with OpenMP on One Machine Numba Pythran PyPy Garbage Collection Differences Running PyPy and Installing Modules When to Use Each Technology Other Upcoming Projects A Note on Graphics Processing Units (GPUs) A Wish for a Future Compiler Project Foreign Function Interfaces ctypes cffi f2py CPython Module Wrap-Up 141 143 145 150 151 153 154 155 157 159 160 161 162 163 165 165 166 166 167 170 173 175 179 Concurrency 181 Introduction to Asynchronous Programming Serial Crawler gevent tornado AsyncIO Database Example Wrap-Up 182 185 187 192 196 198 201 The multiprocessing Module 203 An Overview of the Multiprocessing Module Estimating Pi Using the Monte Carlo Method Estimating Pi Using Processes and Threads Using Python Objects Random Numbers in Parallel Systems Using numpy Finding Prime Numbers Queues of Work Verifying Primes Using Interprocess Communication 206 208 209 210 217 218 221 227 232 Table of Contents www.allitebooks.com | v Serial Solution Naive Pool Solution A Less Naive Pool Solution Using Manager.Value as a Flag Using Redis as a Flag Using RawValue as a Flag Using mmap as a Flag Using mmap as a Flag Redux Sharing numpy Data with multiprocessing Synchronizing File and Variable Access File Locking Locking a Value Wrap-Up 236 236 238 239 241 243 244 245 248 254 254 258 261 10 Clusters and Job Queues 263 Benefits of Clustering Drawbacks of Clustering $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy Skype’s 24-Hour Global Outage Common Cluster Designs How to Start a Clustered Solution Ways to Avoid Pain When Using Clusters Three Clustering Solutions Using the Parallel Python Module for Simple Local Clusters Using IPython Parallel to Support Research NSQ for Robust Production Clustering Queues Pub/sub Distributed Prime Calculation Other Clustering Tools to Look At Wrap-Up 264 265 266 267 268 268 269 270 271 272 277 277 278 280 284 284 11 Using Less RAM 287 Objects for Primitives Are Expensive The Array Module Stores Many Primitive Objects Cheaply Understanding the RAM Used in a Collection Bytes Versus Unicode Efficiently Storing Lots of Text in RAM Trying These Approaches on Million Tokens Tips for Using Less RAM Probabilistic Data Structures Very Approximate Counting with a 1-byte Morris Counter K-Minimum Values vi | Table of Contents www.allitebooks.com 288 289 292 294 295 296 304 305 306 308 Bloom Filters LogLog Counter Real-World Example 312 317 321 12 Lessons from the Field 325 Adaptive Lab’s Social Media Analytics (SoMA) Python at Adaptive Lab SoMA’s Design Our Development Methodology Maintaining SoMA Advice for Fellow Engineers Making Deep Learning Fly with RadimRehurek.com The Sweet Spot Lessons in Optimizing Wrap-Up Large-Scale Productionized Machine Learning at Lyst.com Python’s Place at Lyst Cluster Design Code Evolution in a Fast-Moving Start-Up Building the Recommendation Engine Reporting and Monitoring Some Advice Large-Scale Social Media Analysis at Smesh Python’s Role at Smesh The Platform High Performance Real-Time String Matching Reporting, Monitoring, Debugging, and Deployment PyPy for Successful Web and Data Processing Systems Prerequisites The Database The Web Application OCR and Translation Task Distribution and Workers Conclusion Task Queues at Lanyrd.com Python’s Role at Lanyrd Making the Task Queue Performant Reporting, Monitoring, Debugging, and Deployment Advice to a Fellow Developer 325 326 326 327 327 328 328 328 330 332 333 333 333 333 334 334 335 335 335 336 336 338 339 339 340 340 341 341 341 342 342 343 343 343 Index 345 Table of Contents www.allitebooks.com | vii www.allitebooks.com ... of Python experience He’s taught high performance Python at the PyCon and PyData conferences and has been consulting on data science and high performance computing for years in the UK High Performance. .. Micha Gorelick & Ian Ozsvald www.allitebooks.com High Performance Python Micha Gorelick and Ian Ozsvald www.allitebooks.com High Performance Python by Micha Gorelick and Ian Ozsvald Copyright... Canopy, Python( x,y), or Sage These same distributions will make the lives of Linux and Mac users far simpler too Moving to Python Python is the future of Python, and everyone is moving toward it Python