BOOKS FOR PROFESSIONALS BY PROFESSIONALS ® Foundations of Python Network Programming This second edition of Foundations of Python Network Programming targets Python 2.5 through Python 2.7, the most popular production versions of the lan- guage. Python has made great strides since Apress released the first edition of this book back in the days of Python 2.3. The advances required new chapters to be written from the ground up, and others to be extensively revised. You will learn fundamentals like IP, TCP, DNS and SSL by using working Python programs; you will also be able to familiarize yourself with infrastruc- ture components like memcached and message queues. You can also delve into Network server designs, and compare threaded approaches with asynchronous event-based solutions. But the biggest change is this edition's expanded treat- ment of the web. The HTTP protocol is covered in extensive detail, with each feature accom- panied by sample Python code. You can use your HTTP protocol expertise by studying an entire chapter on screen scraping and you can then test lxml and BeautifulSoup against a real-world web site. The chapter on web application pro- gramming now covers both the WSGI standard for component interoperability, as well as modern web frameworks like Django. Finally, all of the old favorites from the first edition are back: E-mail proto- cols like SMTP, POP, and IMAP get full treatment, as does XML-RPC. You can still learn how to code Python network programs using the Telnet and FTP protocols, but you are likely to appreciate the power of more modern alternatives like the paramiko SSH2 library. If you are a Python programmer who needs to learn the network, this is the book that you want by your side. Shelve in: Python User level: Intermediate–Advanced THE APRESS ROADMAP Python Algorithms Pro Python Foundations of Python Network Programming Foundations of Agile Python Development Dive into Python 3 Beginning Python www.apress.com SOURCE CODE ONLINE Companion eBook Brandon Rhodes THE APRESS ROADMAP Python Algorithms Pro Python Foundations of Python Network Programming Foundations of Agile Python Development Dive into Python 3 Beginning Python John Goerzen THE EXPERT’S VOICE ® IN OPEN SOURCE Foundations of Python Network Programming SECOND EDITION Brandon Rhodes and John Goerzen The comprehensive guide to building network applications with Python Rhodes Goerzen SECOND EDITION Python Network Programming Companion eBook Available Foundations of i Foundations of Python Network Programming The comprehensive guide to building network applications with Python Second Edition ■ ■ ■ Brandon Rhodes John Goerzen ii Foundations of Python Network Programming: The comprehensive guide to building network applications with Python Copyright © 2010 by Brandon Rhodes and John Goerzen All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13 (pbk): 978-1-4302-3003-8 ISBN-13 (electronic): 978-1-4302-3004-5 Printed and bound in the United States of America (POD) Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. President and Publisher: Paul Manning Lead Editor: Frank Pohlmann Development Editor: Matt Wade Technical Reviewer: Michael Bernstein Editorial Board: Steve Anglin, Mark Beckner, Ewan Buckingham, Tony Campbell, Gary Cornell, Jonathan Gennick, Michelle Lowman, Matthew Moodie, Jeffrey Pepper, Frank Pohlmann, Ben Renow-Clarke, Dominic Shakeshaft, Matt Wade, Tom Welsh Coordinating Editor: Laurin Becker Copy Editors: Mary Ann Fugate and Patrick Meador Compositor: MacPS, LLC Indexer: Potomac Indexing, LLC Cover Designer: Anna Ishchenko Distributed to the book trade worldwide by Springer Science+Business Media, LLC., 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. For information on translations, please e-mail rights@apress.com, or visit www.apress.com. Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/info/bulksales . The information in this book is distributed on an “as is” basis, without warranty. Although every precaution has been taken in the preparation of this work, neither the author(s) nor Apress shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in this work. The source code for this book is available to readers at www.apress.com. iii To the Python community for creating a programming language, libraries, and packages that are freely written and freely gifted from one programmer to another To small Persephone-cat for keeping me warm while revising chapters late at night And, most of all, to my Jackie iv Contents at a Glance ■Contents v ■About the Authors xv ■About the Technical Reviewer xvi ■Acknowledgments xvii ■Introduction xviii ■Chapter 1: Introduction to Client/Server Networking 1 ■Chapter 2: UDP 15 ■Chapter 3: TCP 35 ■Chapter 4: Socket Names and DNS 51 ■Chapter 5: Network Data and Network Errors 71 ■Chapter 6: TLS and SSL 87 ■Chapter 7: Server Architecture 99 ■Chapter 8: Caches, Message Queues, and Map-Reduce 125 ■Chapter 9: HTTP 137 ■Chapter 10: Screen Scraping 163 ■Chapter 11: Web Applications 179 ■Chapter 12: E-mail Composition and Decoding 197 ■Chapter 13: SMTP 217 ■Chapter 14: POP 235 ■Chapter 15: IMAP 243 ■Chapter 16: Telnet and SSH 263 ■Chapter 17: FTP 291 ■Chapter 18: RPC 305 ■Index 323 v Contents ■Contents at a Glance iv ■About the Authors xv ■About the Technical Reviewer xv ■Acknowledgments xvi ■Introduction xvii ■Chapter 1: Introduction to Client/Server Networking 1 The Building Blocks: Stacks and Libraries 1 Application Layers 4 Speaking a Protocol 5 A Raw Network Conversation 6 Turtles All the Way Down 8 The Internet Protocol 9 IP Addresses 10 Routing 11 Packet Fragmentation 13 Learning More About IP 14 ■Chapter 2: UDP 15 Should You Read This Chapter? 16 Addresses and Port Numbers 16 Port Number Ranges 17 Sockets 19 ■ CONTENTS vi Unreliability, Backoff, Blocking, Timeouts 22 Connecting UDP Sockets 25 Request IDs: A Good Idea 27 Binding to Interfaces 28 UDP Fragmentation 30 Socket Options 31 Broadcast 32 When to Use UDP 33 Summary 34 ■Chapter 3: TCP 35 How TCP Works 35 When to Use TCP 36 What TCP Sockets Mean 37 A Simple TCP Client and Server 38 One Socket per Conversation 41 Address Already in Use 42 Binding to Interfaces 43 Deadlock 44 Closed Connections, Half-Open Connections 48 Using TCP Streams like Files 49 Summary 49 ■Chapter 4: Socket Names and DNS 51 Hostnames and Domain Names 51 Socket Names 52 Five Socket Coordinates 53 IPv6 54 Modern Address Resolution 55 ■ CONTENTS vii Asking getaddrinfo() Where to Bind 56 Asking getaddrinfo() About Services 56 Asking getaddrinfo() for Pretty Hostnames 57 Other getaddrinfo() Flags 58 Primitive Name Service Routines 59 Using getsockaddr() in Your Own Code 60 Better Living Through Paranoia 61 A Sketch of How DNS Works 63 Why Not to Use DNS 65 Why to Use DNS 66 Resolving Mail Domains 68 Zeroconf and Dynamic DNS 70 Summary 70 ■Chapter 5: etwork Data and Network Errors 71 Text and Encodings 71 Network Byte Order 73 Framing and Quoting 75 Pickles and Self-Delimiting Formats 79 XML, JSON, Etc. 80 Compression 81 Network Exceptions 82 Handling Exceptions 83 Summary 85 ■Chapter 6: TLS and SSL 87 Computer Security 87 IP Access Rules 88 Cleartext on the Network 90 ■ CONTENTS viii TLS Encrypts Your Conversations 92 TLS Verifies Identities 93 Supporting TLS in Python 94 The Standard SSL Module 95 Loose Ends 98 Summary 98 ■Chapter 7: Server Architecture 99 Daemons and Logging 99 Our Example: Sir Launcelot 100 An Elementary Client 102 The Waiting Game 103 Running a Benchmark 106 Event-Driven Servers 109 Poll vs. Select 112 The Semantics of Non-blocking 113 Event-Driven Servers Are Blocking and Synchronous 114 Twisted Python 114 Load Balancing and Proxies 117 Threading and Multi-processing 117 Threading and Multi-processing Frameworks 120 Process and Thread Coordination 122 Running Inside inetd 123 Summary 124 ■Chapter 8: Caches, Message Queues, and Map-Reduce 125 Using Memcached 126 Memcached and Sharding 128 Message Queues 130 [...]... these three versions of Python when writing network code As of this writing, the Python 2 series is still the workaday version of the language for programmers who use Python in production In fact, the pinnacle of that line of language development Python 2.7— was released just a few months ago, and a second bugfix release is now in testing Interest in the futuristic Python 3 version of the language is... what the communication looks like if that layer of software is stripped away 7 CHAPTER 1 ■ INTRODUCTION TO CLIENT/SERVER NETWORKING Turtles All the Way Down I hope you have enjoyed these initial examples of what Python network programming can look like Stepping back, we can use this series of examples to make several points about network programming in Python First, you can perhaps now see more clearly... of the network stack that we can still access easily from Python Take a careful look at search4.py It makes exactly the same networking request to Google Maps as our previous three programs, but it does so by sending a raw text message across the Internet and receiving a bundle of text in return Listing 1–4 Talking to Google Maps Through a Bare Socket #!/usr/bin/env python # Foundations of Python Network. .. CLIENT/SERVER NETWORKING • The fact that you will often be using Python libraries of prepared code—whether from the built-in standard library that ships with Python, or from third-party modules that you download and install—that already know how to speak the network protocol you want to use In many cases, network programming simply involves selecting and using a library that already supports the network. .. avid Python programmer since the 1990s, and a professional Python developer for a decade He released his PyEphem astronomy library in the same year that Python 1.5 was released, and has maintained it ever since As a writer and speaker, Brandon enjoys teaching and touting Python, whether as the volunteer organizer of Python Atlanta or on stage at conferences like PyCon He was editor of the monthly Python. .. solution Networking This book teaches network programming by focusing on the Internet protocols—the kind of network in which most programmers are interested these days, and the protocols that are best supported by the Python Standard Library Their design and operation is a good introduction to networking in general, so you might find this book useful even if you intend to target other networks from Python; ... installed googlemaps Cleaning up The python binary inside the virtualenv will now have the googlemaps package available: $ python -c 'import googlemaps' Now that you have the googlemaps package installed, you should be able to run the simple program named search1.py Listing 1–1 Fetching a Longitude and Latitude #!/usr/bin/env python # Foundations of Python Network Programming - Chapter 1 - search1.py... hundreds of millions of homes worldwide Many casual computer users spend their entire digital lives speaking exclusively to network services; they are only vaguely aware that their computer is even capable of running local applications This is also a moment when, after 20 solid years of growth and improvement, interest in Python really seems to be taking off This is different from the trajectory of other... new version of the language If you are entirely new to programming, then an Amazon search will suggest several highly rated books that use Python itself to teach you the basics A long list of online resources, some of which are complete e-books, is maintained at this link: wiki .python. org/moin/BeginnersGuide/NonProgrammers xviii ■ INTRODUCTION If you do know something about Python and programming but... site raise a raw, low-level networking exception in the middle of code that’s just trying to find the coordinates of a street address? We will pay careful 8 CHAPTER 1 ■ INTRODUCTION TO CLIENT/SERVER NETWORKING attention to the topic of catching network errors as we go forward through this book, especially in the chapters of this first section, with their emphasis on low-level networking And for our final . BOOKS FOR PROFESSIONALS BY PROFESSIONALS ® Foundations of Python Network Programming This second edition of Foundations of Python Network Programming targets Python 2.5 through Python 2.7,. Intermediate–Advanced THE APRESS ROADMAP Python Algorithms Pro Python Foundations of Python Network Programming Foundations of Agile Python Development Dive into Python 3 Beginning Python www.apress.com SOURCE. Rhodes THE APRESS ROADMAP Python Algorithms Pro Python Foundations of Python Network Programming Foundations of Agile Python Development Dive into Python 3 Beginning Python John Goerzen THE