1. Trang chủ
  2. » Công Nghệ Thông Tin

Learning concurrency in python speed up your python code with clean, readable, and advanced concurrency techniques

352 204 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 352
Dung lượng 2,36 MB

Nội dung

Learning Concurrency in Python 4QFFEVQZPVS1ZUIPODPEFXJUIDMFBOSFBEBCMFBOE BEWBODFEDPODVSSFODZUFDIOJRVFT Elliot Forbes BIRMINGHAM - MUMBAI Learning Concurrency in Python Copyright © 2017 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information First published: August 2017 Production reference: 1140817 1VCMJTIFECZ1BDLU1VCMJTIJOH-UE -JWFSZ1MBDF -JWFSZ4USFFU #JSNJOHIBN #1#6, ISBN 978-1-78728-537-8 XXXQBDLUQVCDPN Credits Author Elliot Forbes Copy Editor Sonia Mathur Reviewer Nikolaus Gradwohl Project Coordinator Vaidehi Sawant Commissioning Editor Merint Mathew Proofreader Safis Editing Acquisition Editor Chaitanya Nair Indexer Francy Puthiry Content Development Editor Rohit Kumar Singh Graphics Abhinash Sahu Technical Editors Ketan Kamble Production Coordinator Nilesh Mohite About the Author Elliot Forbes he worked as a full-time software engineer at JPMorgan Chase for the last two years He graduated from the University of Strathclyde in Scotland in the spring of 2015 and worked as a freelancer developing web solutions while studying there He has worked on numerous different technologies such as GoLang and NodeJS and plain old Java, and he has spent years working on concurrent enterprise systems It is with this experience that he was able to write this book Elliot has even worked at Barclays Investment Bank for a summer internship in London and has maintained a couple of software development websites for the last three years About the Reviewer Nikolaus Gradwohl was born 1976 in Vienna, Austria and always wanted to become an inventor like Gyro Gearloose When he got his first Atari, he figured out that being a computer programmer is the closest he could get to that dream For a living, he wrote programs for nearly anything that can be programmed, ranging from an 8-bit microcontroller to mainframes In his free time, he likes to master on programming languages and operating systems Nikolaus authored the Processing 2: Creative Coding Hotshot book, and you can see some of his work on his blog at IUUQXXXMPDBMHVSVOFU                       www.PacktPub.com For support files and downloads related to your book, please visit XXX1BDLU1VCDPN Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at XXX1BDLU1VCDPN and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at TFSWJDF!QBDLUQVCDPN for more details At XXX1BDLU1VCDPN, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks IUUQTXXXQBDLUQVCDPNNBQU                         Get the most in-demand software skills with Mapt Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career Why subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browser Customer Feedback Thanks for purchasing this Packt book At Packt, quality is at the heart of our editorial process To help us improve, please leave us an honest review on this book's Amazon page at IUUQTXXXBNB[PODPNEQ                      If you'd like to join our team of regular reviewers, you can e-mail us at DVTUPNFSSFWJFXT!QBDLUQVCDPN We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback Help us be relentless in improving our products! Table of Contents Preface Chapter 1: Speed It Up! History of concurrency Threads and multithreading What is a thread? Types of threads What is multithreading? Processes Properties of processes Multiprocessing Event-driven programming Turtle Breaking it down Reactive programming ReactiveX - RxPy Breaking it down GPU programming PyCUDA OpenCL Theano The limitations of Python Jython IronPython Why should we use Python? Concurrent image download Sequential download Breaking it down Concurrent download Breaking it down Improving number crunching with multiprocessing Sequential prime factorization Breaking it down Concurrent prime factorization Breaking it down Summary 8 9 10 11 12 13 14 15 16 16 18 19 20 20 21 21 22 23 23 23 24 24 25 26 26 27 27 28 29 30 Chapter 2: Parallelize It 31 Understanding concurrency Properties of concurrent systems I/O bottlenecks Understanding parallelism CPU-bound bottlenecks How they work on a CPU? Single-core CPUs Clock rate Martelli model of scalability Time-sharing - the task scheduler Multi-core processors System architecture styles SISD SIMD MISD MIMD Computer memory architecture styles UMA NUMA Summary Chapter 3: Life of a Thread 32 32 33 35 36 36 37 37 38 39 40 41 41 42 44 44 45 45 46 48 49 Threads in Python Thread state State flow chart 49 50 51 51 52 52 53 53 53 53 54 54 55 55 55 56 57 57 Python example of thread state Breaking it down Different types of threads POSIX threads Windows threads The ways to start a thread Starting a thread Inheriting from the thread class Breaking it down Forking Example Breaking it down Daemonizing a thread Example Breaking it down [ ii ] Event-Driven Programming These can be instantiated as follows: JNQPSUBTZODJP  NZ4FNBQIPSFBTZODJP4FNBQIPSF WBMVF MPPQ/POF  CPVOEFE4FNBQIPSFBTZODJP#PVOEFE4FNBQIPSF WBMVF MPPQ/POF Sub-processes We’ve seen in the past that single process programs sometimes cannot meet the demands required of them in order for our software to function properly We looked at various mechanisms in the previous chapters on how we can improve performance using multiple processes, and thankfully, asyncio comes with the ability for us to still leverage the power of sub-processes within our event-driven based programs I am not a fan of using this mechanism for improving performance as it can drastically heighten the complexity of your programs However, this isn't to say that there aren't situations where this would be useful, and as such, I should make you aware of the official documentation, which can be found at IUUQTEPDTQZUIPOPSHMJCSBSZBTZODJPTV                                    CQSPDFTTIUNM             Debugging asyncio programs Thankfully, when it comes to debugging asyncio-based applications, we have a couple of options to consider The writers of the asyncio module have very kindly provided a debug mode, which is quite powerful and can really aid us in our debugging adventures without the overhead of modifying the system's code base too dramatically Debug mode Turning on this debug mode within your asyncio-based programs is relatively simply and requires just a call to this function: MPPQTFU@EFCVH 5SVF [ 246 ]  Event-Driven Programming Let’s take a look at a fully fledged example of this and how it differs from your standard logging In this example, we’ll create a very simple event loop and submit some simple tasks to the event loop: JNQPSUBTZODJP JNQPSUMPHHJOH JNQPSUUJNF MPHHJOHCBTJD$POGJH MFWFMMPHHJOH%( BTZODEFGNZ8PSLFS  MPHHJOHJOGP .Z8PSLFS$PSPVUJOF)JU UJNFTMFFQ  BTZODEFGNBJO  MPHHJOHEFCVH .Z.BJO'VODUJPO)JU BXBJUBTZODJPXBJU MPPQBTZODJPHFU@FWFOU@MPPQ MPPQTFU@EFCVH 5SVF USZ MPPQSVO@VOUJM@DPNQMFUF NBJO GJOBMMZ MPPQDMPTF Upon execution of this, you should see the following chunk of code being outputted: $ python3.6 11_debugAsyncio.py -Wdefault DEBUG:asyncio:Using selector: KqueueSelector DEBUG:root:My Main Function Hit INFO:root:My Worker Coroutine Hit WARNING:asyncio:Executing took 1.004 seconds DEBUG:asyncio:Close You should notice the extra log statements that wouldn't otherwise have been included within your logs These extra log statements provide us with a far more granular idea of what's going on with regards to our event loop In this simple example, the debug mode was able to catch the fact that one of our coroutines took longer than 100 MS to execute as well as when our event loop finally closed [ 247 ] Event-Driven Programming You should note that this debug mode of asyncio is very useful for doing things such as determining what coroutines are never yielded from and thus, killing your program's performance Other checks that it can be performed are as follows: The DBMM@TPPO and DBMM@BU methods raise an exception if they are called from the wrong thread Logging the execution time of the selector The 3FTPVSDF8BSOJOH warnings are emitted when transports and event loops are not closed explicitly This is just for starters Overall, this is a rather powerful tool to have and gives you an option other than your typical Pdb as a means of looking deeper into your programs I implore you to check out the official documentation of asyncio’s debug mode at IUUQTEP DTQZUIPOPSHMJCSBSZBTZODJPEFWIUNMBTZODJPEFCVHNPEF, as it provides a few far more in-depth examples that you can peruse at your own leisure                                                             Twisted Twisted in Python is a very popular and very powerful event-driven networking engine that can be used for a huge range of different projects such as web servers, mail clients, subsystems, and more Twisted is an amalgamation of both high- and low-level APIs that can be effectively utilized to create very powerful, elegant programs without masses of boilerplate code It is both asynchronous and event-based by design and could be compared to the asyncio module as a more fully fleshed out older sibling If you are interested in learning far more on the Twisted framework then I fully recommend Twisted Network Programming Essentials, 2nd Edition, by Abe Fettig and Jessica McKellar A simple web server example Twisted is absolutely ideal for doing things such as serving websites We can set up a simple TCP server that listens on a particular endpoint that can serve files from a relative file location to someone requesting it in a browser [ 248 ] Event-Driven Programming In this example, we'll set up a very simple web server that serves local content from a UNQ directory within the same directory as your program We will first import the necessary parts of the UXJTUFEXFC and UXJTUFEJOUFSOFU modules We will then define an instance of the file resource that maps to the directory we wish to serve Below this, we will create an instance of the site factory using our newly defined resource Finally, we will map our 4JUF factory to a TCP port and start listening before calling SFBDUPSSVO , which drives our entire program by handling things such as accepting any incoming TCP connections and passing bytes in and out of the said connections: GSPNUXJTUFEXFCTFSWFSJNQPSU4JUF GSPNUXJTUFEXFCTUBUJDJNQPSU'JMF GSPNUXJTUFEJOUFSOFUJNQPSUSFBDUPSFOEQPJOUT SFTPVSDF'JMF UNQ GBDUPSZ4JUF SFTPVSDF FOEQPJOUFOEQPJOUT5$14FSWFS&OEQPJOU SFBDUPS  FOEQPJOUMJTUFO GBDUPSZ SFBDUPSSVO We'll serve this very simple IUNM file, which features a I tag within its body Nothing special, but it represents a good starting point for any website: %0$5:1&IUNM IUNMMBOHFO IFBE NFUBDIBSTFU65' NFUBOBNFWJFXQPSUDPOUFOUXJEUIEFWJDF XJEUIJOJUJBMTDBMF NFUBIUUQFRVJW96"$PNQBUJCMF DPOUFOUJFFEHF UJUMF 5XJTUFE)PNF1BHFUJUMF IFBE CPEZ I 5XJTUFE)PNFQBHFI CPEZ IUNM Upon running the preceding code, navigate to IUUQMPDBMIPTU in your local browser, and you should see our newly created website in all it's glory being served up [ 249 ] Event-Driven Programming This represents just a very small fraction of what the Twisted framework is capable of, and unfortunately, due to concerns for the brevity of this book, I cannot possibly divulge all of its inner workings The overall power of this framework is incredible, and it has been a joy working with it in the past Gevent The HFWFOU networking library is based purely on top of coroutines It's very similar in nature when compared to Twisted and provides us with a similar range of functionality that we can leverage to build our own network-based event-driven Python applications Its features include the following: A fast event loop Lightweight execution units based on greenlets An API that reuses concepts from the Python standard library Cooperative sockets and SSL modules TCP/UDP/HTTP servers Thread pools Sub-process support It also includes a number of other features It features a healthy collection of internal methods and classes that enable us, as developers, to develop some pretty powerful and flexible systems If you are interested in reading up on everything that it provides, then I suggest you check out the official documentation, which can be found at IUUQXXXHFWFO UPSHDPOUFOUTIUNM                               Event loops Much like the BTZODJP module, HFWFOU utilizes the concept of an event loop This event loop is incredibly efficient in design as it handles events as and when they are registered within the event loop It lets the OS handle the delivery of event notifications and focuses on making real progress on the events as opposed to wasting valuable resources polling for events [ 250 ] Event-Driven Programming Greenlets The core of the HFWFOU framework is the greenlet Greenlets, if you haven't worked with them before, are a very lightweight coroutine written in C that are cooperatively scheduled They provide us with a very lightweight thread-like object that allows us to achieve concurrent execution within our Python programs without incurring the cost of spinning up multiple threads These lightweight pseudo threads are spawned by the creation of a greenlet instance and subsequently call its start method These lightweight pseudo threads then execute a certain amount of code before cooperatively giving up control and allowing another pseudo thread to then take over This cycle of repeated work and then giving up is repeated over and over again until the program has accomplished it's target and terminated We can define greenlets through the use of functions such as spawn as follows: JNQPSUHFWFOU EFGNZ(SFFOMFU  QSJOU .Z(SFFOMFUJTFYFDVUJOH HFWFOUTQBXO NZ(SFFOMFU HFWFOUTMFFQ  Alternatively, we can also define them through the use of sub-classing as follows: JNQPSUHFWFOU GSPNHFWFOUJNQPSU(SFFOMFU DMBTT.Z/PPQ(SFFOMFU (SFFOMFU  EFG@@JOJU@@ TFMGTFDPOET  (SFFOMFU@@JOJU@@ TFMG TFMGTFDPOETTFDPOET EFG@SVO TFMG  QSJOU .Z(SFFOMFUFYFDVUJOH HFWFOUTMFFQ TFMGTFDPOET EFG@@TUS@@ TFMG  SFUVSO Z/PPQ(SFFOMFU T TFMGTFDPOET H.Z/PPQ(SFFOMFU  HTUBSU HKPJO QSJOU HEFBE [ 251 ] Event-Driven Programming Simple example-hostnames In this simple example, we'll introduce the use of gevent within your Python applications We'll write something that takes in an array of three distinct URLs, and then, we'll spawn an array of gevents in order to retrieve the IP addresses of these URLs using the TPDLFUHFUIPTUCZOBNF function We'll then use KPJOBMM for these gevents, giving them a timeout of seconds, and finally, we'll print the array of IP addresses this returns: JNQPSUHFWFOU GSPNHFWFOUJNQPSUTPDLFU EFGNBJO  VSMT< XXXHPPHMFDPN  XXXFYBNQMFDPN  XXXQZUIPOPSH > KPCT HFWFOUKPJOBMM KPCTUJNFPVU QSJOU JG@@OBNF@@ @@NBJO@@  NBJO Output When you run the preceding program, you should see output that looks similar to this one: $ python3.6 19_geventSimple.py ['216.58.211.100', '93.184.216.34', '151.101.60.223'] Monkey patching One concept that was new to me when I was researching the gevent library was the concept of monkey patching It has absolutely nothing to with monkeys unfortunately and is, instead, focused on the dynamic modification of a class or module at runtime A great example of how this works is if you imagine you have a Python program that contains a method that retrieves a value from an external dependency such as a database, a REST API, or something else We can utilize monkey patching as a means of stubbing out this method so that it does not hit our external dependency when we are running our unit tests [ 252 ] Event-Driven Programming Why does this necessarily fall under a gevent though? Well, in gevent, we can utilize monkey patching in order to carefully replace functions and classes with cooperative counterparts An example of how powerful this can be is if you consider the standard socket module DNS requests are by default serialized and, thus, are incredibly slow when done in bulk In order to perform these DNS requests concurrently, we can utilize monkey patching! Using HFWFOUNPOLFZ, we can monkey patch the functions and classes in the socket module with cooperative counterparts, thus solving all our performance issues: GSPNHFWFOUJNQPSUNPOLFZNPOLFZQBUDI@TPDLFU JNQPSUVSMMJCJU TVTBCMFGSPNNVMUJQMFHSFFOMFUTOPX If you want to learn more about this magical performance improving technique, then I recommend you check out the official documentation at IUUQXXXHFWFOUPSHJOUSPI UNMNPOLFZQBUDIJOH                                           Summary In this chapter, we covered the paradigm of event-driven programming before covering how asyncio works and how we can use it for our own event-driven Python systems We went in depth into the asyncio module and the various ways you can things such as construct your event loops, chain coroutines within this loop, and set up event handlers, to name but a few things In the next chapter, we'll look at how you can create reactive programs using the powerful 3Y1Z module and cover how the reactive programming paradigm differs from your typical event-based programs [ 253 ] 10 Reactive Programming While event-driven programming might revolve around events, in reactive programming, we deal purely with data In other words, every time we receive a new piece of data, we consider this to be an event Due to this definition, you could technically call it a branch of event-driven programming However, due to its popularity and the differences in the way it does things, I couldn't help but put reactive programming in a chapter of its own In this chapter, we will dive deeper into one of the most popular libraries available in Python when it comes to reactive programming, 3Y1: We'll cover in depth some of the features of this library and how we can utilize this to create our own asynchronous programs We'll come to terms with some of the basics necessary of 3Y1: to get us started: Dealing with observers and observables Lambda functions and how we can use them The multitude of operators and how we can chain these to achieve a desired state The differences between both hot and cold observables Multicasting We'll also take a brief look at the 1Z'VODUJPOBM library, and how this differs from 3Y1: and how we can leverage that in certain scenarios You should note that some of these examples from the official documentation have also been covered in a video course called Reactive Python for Data Science by Thomas Nield I highly recommend this course as Thomas covers a lot of material that I've not had a chance to in this chapter You can find this course at IUUQTIPQPSFJMMZDPNQSPEVDUEP                               Reactive Programming Basic reactive programming Reactive programming is a paradigm that is totally unlike that of your more traditional imperative style of programming Being aware of the strengths and weaknesses of reactive programming could help you turn software disasters into potential successes With reactive programming, we can totally destroy the imperative style and instead focus on representing our data as a stream of events We can subscribe to these subsequent streams and take action upon receiving these events This helps us simplify our system's flow that could quickly become very unwieldy and unmaintainable if we were to follow a more traditional architecture and style Reactive libraries take away the complexity of us having to push our events to various functions within our systems, and enable us to effectively work with data as queryable, real-time streams We can essentially fashion programs that will run infinitely against an infinite stream of events such as constant stock quotes or social media interactions This reactive programming paradigm has been taken up by the likes of data scientists who may have streams of statistical data or sensory data coming in, which they have to analyze and make decisions Maintaining purity In a reactive paradigm, it's important that we try to make all our transactions stateless By enforcing stateless transactions, we essentially reduce the number of potential side-effects that could impact our program's execution This pure style of programming is one that functional programmers tend to live by, and it's proving to be an incredibly powerful paradigm when it comes to designing highly resilient distributed systems Overall, it's something that I would try and follow right from the start as you develop these new systems ReactiveX, or RX Reactive Extensions, or RX, first hit the scenes in around 2010 and has been heavily adopted by large tech companies such as Netflix using 3Y+BWB It has since grown into something far bigger and more prevalent within the industry [ 255 ] Reactive Programming It comes in many different flavors for each of the different programming languages currently out there The most popular of these are as follows: 3Y+BWB for Java 3Y+4 for JavaScript 3Y1Z for Python 3Y4XJGU for Swift The full list of Rx flavors can be found at IUUQTHJUIVCDPN3FBDUJWF9                        Reactive Extensions for Python, or 3Y1: as it has been condensed to, is a library for composing asynchronous and event-based programs using observable collections and LINQ-style query operators in Python I first came across a similar version of ReactiveX when I was working with the new Angular framework, and my experience with it was great It let me turn a web socket stream into an observable, which could subsequently be watched from within my Angular application and displayed, in real-time, in the browser 3Y1: is equally useful, and it paves the way for you to write some incredibly interesting applications while handling the underlying complexity of dealing with observers and observables One of the best examples I could envisage this library being used is in, say, a stock trading application You could in theory have an API that constantly checks the price of certain stocks and in turn stream it back to your 3Y1:-based stock trading application if certain conditions were met Say, for instance, a stock that you own falls in value by 20 percent, you could subscribe to this type of event and then react to this situation in whatever way you wish, be it to sell off the stock or to buy more of it Installing RxPY Installing 3Y1: can be done with ease using pip as follows: pip install rx You should note that 3Y1: runs on both Python 2.7+ and 3.4+ as well as 1Z1Z and *SPO1ZUIPO [ 256 ] Reactive Programming Observables Observables are the most important part of our 3Y1Z applications We define these observables that can emit events to any observer that is currently registered to receive events from the said observable The key thing to note is that these observers utilize a push mechanism in order to notify subscribers of new events as opposed to a pull mechanism Creating observers In RxPY, nearly anything can be turned into an observable, which makes it immensely powerful as a library, and when it comes to consuming from these observables, we have a many options If we need to, we could utilize a quick and easy lambda function or we could define a fully fledged class that handles it all for us Example In this example, we’ll implement 1SJOU0CTFSWFS, which will subclass the SY0CTFSWFS class This will implement the three necessary functions required: PO@OFYU , PO@DPNQMFUFE , and PO@FSSPS Each of these three functions has an important role to play within our observers: PO@OFYU TFMGWBMVF : This is called whenever the observer receives a new event PO@DPNQMFUFE TFMG : This is called whenever our observable notifies it that it has completed its task PO@FSSPS TFMGFSSPS : This is called whenever we wish to handle error cases Thankfully, in our simple example, we shouldn't have to worry about this This will be a relatively simple example that will just print out the values received when the PO@OFYU function is called, but it should give you the basic idea of how to define your own observers Note: The following code snippet was taken from the official 3Y1: documentation [ 257 ] Reactive Programming Let's examine this code snippet that features a very simple 3Y1: observer and observable: GSPNSYJNQPSU0CTFSWBCMF0CTFSWFS EFGQVTI@GJWF@TUSJOHT PCTFSWFS  PCTFSWFSPO@OFYU "MQIB PCTFSWFSPO@OFYU #FUB PCTFSWFSPO@OFYU (BNNB PCTFSWFSPO@OFYU %FMUB PCTFSWFSPO@OFYU &QTJMPO PCTFSWFSPO@DPNQMFUFE DMBTT1SJOU0CTFSWFS 0CTFSWFS  EFGPO@OFYU TFMGWBMVF  QSJOU 3FDFJWFE\^GPSNBU WBMVF EFGPO@DPNQMFUFE TFMG  QSJOU %POF EFGPO@FSSPS TFMGFSSPS  QSJOU &SSPS0DDVSSFE\^GPSNBU FSSPS TPVSDF0CTFSWBCMFDSFBUF QVTI@GJWF@TUSJOHT TPVSDFTVCTDSJCF 1SJOU0CTFSWFS When we execute the preceding Python program, you should see that the five distinct strings are all printed out one after the other before the PO@DPNQMFUFE function is called This signals to our observer that there will be no further events, and the program then terminates: $ python3.6 07_createObservable.py Received Alpha Received Beta Received Gamma Received Delta Received Epsilon Done! [ 258 ] ... Starting a thread Inheriting from the thread class Breaking it down Forking Example Breaking it down Daemonizing a thread Example Breaking it down [ ii ] Handling threads in Python Starting loads... system in such a way that every train would safely get to their destinations without incurring casualties It was only in the 1960s that academia picked up interest in concurrent computing, and it.. .Learning Concurrency in Python 4QFFEVQZPVS1ZUIPODPEFXJUIDMFBOSFBEBCMFBOE BEWBODFEDPODVSSFODZUFDIOJRVFT Elliot Forbes BIRMINGHAM - MUMBAI Learning Concurrency in Python Copyright

Ngày đăng: 04/03/2019, 16:41

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN