quick python book 3rd

Copyright History For online information and ordering of this and other Manning books, please visit Topics www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Tutorials Special Sales Department Offers & Manning Publications Co Deals 20 Baldwin Road PO Box 761 Highlights Shelter Island, NY 11964 Email: orders@manning.com Settings ©2018 by Manning Publications Co. All rights reserved Support No part of this publication may be reproduced, stored in a retrieval system, or Sign Out transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acidfree paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine Manning Publications Co PO Box 761 Shelter Island , NY 11964 Development editor: Christina Taylor Technical development editor: Scott Steinman Project Manager: Janet Vail Copyeditor Kathy Simpson Proofreader: Elizabeth Martin Technical proofreader: André Brito Typesetter and cover design: Marija Tudor ISBN 9781617294037 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 23 22 21 20 19 18 Playlists Brief Table of Contents History Copyright Topics Brief Table of Contents Tutorials Table of Contents Offers & Deals Praise for the second edition Highlights Foreword Settings Preface Support Acknowledgments Sign Out About this book About the cover illustration 1. Starting out Chapter 1. About Python Chapter 2. Getting started Chapter 3. The Quick Python overview 2. The essentials Chapter 4. The absolute basics Chapter 5. Lists, tuples, and sets Chapter 6. Strings Chapter 7. Dictionaries Chapter 8. Control flow Chapter 9. Functions Chapter 10. Modules and scoping rules Chapter 11. Python programs Chapter 12. Using the filesystem Chapter 13. Reading and writing files Chapter 14. Exceptions 3. Advanced language features Chapter 15. Classes and objectoriented programming Chapter 16. Regular expressions Chapter 17. Data types as objects Chapter 18. Packages Chapter 19. Using Python libraries 4. Working with data Chapter 20. Basic file wrangling Chapter 21. Processing data files Chapter 22. Data over the network Chapter 23. Saving data Chapter 24. Exploring data Case study A. A guide to Python’s documentation B. Exercise answers Index List of Figures List of Tables List of Listings Playlists Part Starting out History These first three chapters tell you a little bit about Python, its strengths and Topics weaknesses, and why you should consider learning Python 3. In chapter 2 you see how to install Python on Windows, macOS, and Linux platforms and how to write a simple Tutorials program. Chapter 3 is a quick, highlevel survey of Python’s syntax and features Offers & Deals If you’re looking for the quickest possible introduction to Python, start with chapter 3 Highlights Settings Support Sign Out Playlists Chapter About Python History This chapter covers Topics Why use Python? Tutorials What Python does well Offers &What Python doesn’t do as well Deals Why learn Python 3? Highlights Read this chapter if you want to know how Python compares to other languages and its Settings place in the grand scheme of things. Skip ahead—go straight to chapter 3—if you want to start learning Python right away. The information in this chapter is a valid part of this Support book—but it’s certainly not necessary for programming with Python Sign Out 1.1 WHY SHOULD I USE PYTHON? Hundreds of programming languages are available today, from mature languages like C and C++, to newer entries like Ruby, C#, and Lua, to enterprise juggernauts like Java Choosing a language to learn is difficult. Although no one language is the right choice for every possible situation, I think that Python is a good choice for a large number of programming problems, and it’s also a good choice if you’re learning to program Hundreds of thousands of programmers around the world use Python, and the number grows every year Python continues to attract new users for a variety of reasons. It’s a true crossplatform language, running equally well on Windows, Linux/UNIX, and Macintosh platforms, as well as others, ranging from supercomputers to cell phones. It can be used to develop small applications and rapid prototypes, but it scales well to permit development of large programs. It comes with a powerful and easytouse graphical user interface (GUI) toolkit, web programming libraries, and more. And it’s free 1.2 WHAT PYTHON DOES WELL Python is a modern programming language developed by Guido van Rossum in the 1990s (and named after a famous comedic troupe). Although Python isn’t perfect for every application, its strengths make it a good choice for many situations 1.2.1 Python is easy to use Programmers familiar with traditional languages will find it easy to learn Python. All of the familiar constructs—loops, conditional statements, arrays, and so forth—are included, but many are easier to use in Python. Here are a few of the reasons why: Types are associated with objects, not variables. A variable can be assigned a value of any type, and a list can contain objects of many types. This also means that type casting usually isn’t necessary and that your code isn’t locked into the straitjacket of predeclared types Python typically operates at a much higher level of abstraction. This is partly the result of the way the language is built and partly the result of an extensive standard code library that comes with the Python distribution. A program to download a web page can be written in two or three lines! Syntax rules are very simple. Although becoming an expert Pythonista takes time and effort, even beginners can absorb enough Python syntax to write useful code quickly Python is well suited for rapid application development. It isn’t unusual for coding an application in Python to take onefifth the time it would in C or Java and to take as little as onefifth the number of lines of the equivalent C program. This depends on the particular application, of course; for a numerical algorithm performing mostly integer arithmetic in for loops, there would be much less of a productivity gain. For the average application, the productivity gain can be significant 1.2.2 Python is expressive Python is a very expressive language. Expressive in this context means that a single line of Python code can do more than a single line of code in most other languages. The advantages of a more expressive language are obvious: The fewer lines of code you have to write, the faster you can complete the project. The fewer lines of code there are, the easier the program will be to maintain and debug To get an idea of how Python’s expressiveness can simplify code, consider swapping the values of two variables, var1 and var2. In a language like Java, this requires three lines of code and an extra variable: int temp = var1; var1 = var2; var2 = temp; The variable temp is needed to save the value of var1 when var2 is put into it, and then that saved value is put into var2. The process isn’t terribly complex, but reading those three lines and understanding that a swap has taken place takes a certain amount of overhead, even for experienced coders By contrast, Python lets you make the same swap in one line and in a way that makes it obvious that a swap of values has occurred: var2, var1 = var1, var2 Of course, this is a very simple example, but you find the same advantages throughout the language 1.2.3 Python is readable Another advantage of Python is that it’s easy to read. You might think that a programming language needs to be read only by a computer, but humans have to read your code as well: whoever debugs your code (quite possibly you), whoever maintains your code (could be you again), and whoever might want to modify your code in the future. In all of those situations, the easier the code is to read and understand, the better it is The easier code is to understand, the easier it is to debug, maintain, and modify Python’s main advantage in this department is its use of indentation. Unlike most languages, Python insists that blocks of code be indented. Although this strikes some people as odd, it has the benefit that your code is always formatted in a very easyto read style Following are two short programs, one written in Perl and one in Python. Both take two equalsize lists of numbers and return the pairwise sum of those lists. I think the Python code is more readable than the Perl code; it’s visually cleaner and contains fewer inscrutable symbols: # Perl version sub pairwise_sum { my($arg1, $arg2) = @_; my @result; Quick Check: getitem The example use of getitem above is very limited and won’t work correctly in many situations. What are some cases in which the implementation above will fail or work incorrectly? This implementation will not work if you try to access an item directly by index; neither can you move backward Try this: Implementing list special methods Try implementing the len and delitem special methods listed earlier, as well as an append method. The implementation is in bold in the code class TypedList: def init (self, example_element, initial_list=[]): self.type = type(example_element) if not isinstance(initial_list, list): raise TypeError("Second argument of TypedList must " "be a list.") for element in initial_list: self. check(element) self.elements = initial_list[:] def check(self, element): if type(element) != self.type: raise TypeError("Attempted to add an element of " "incorrect type to a typed list.") def setitem (self, i, element): self. check(element) self.elements[i] = element def getitem (self, i): return self.elements[i] # added methods def delitem (self, i): del self.elements[i] def len (self): return len(self.elements) def append(self, element): self. check(element) self.elements.append(element) x = TypedList(1, [1,2,3]) print(len(x)) x.append(1) del x[2] Quick Check: Special method attributes and subclassing existing types Suppose that you want a dictionary like type that allows only strings as keys (maybe to make it work like a shelf object, as described in Chapter 13). What options would you have for creating such a class? What would be the advantages and disadvantages of each option? You could use the same approach as you did for TypedList and inherit from the UserDict class. You could also inherit directly from dict, or you could implement all of the dict functionality yourself Implementing everything yourself provides the most control but is the most work and most prone to bugs. If the changes you need to make are small (in this case, just checking the type before adding a key), it might make the most sense to inherit directly from dict. On the other hand, inheriting from UserDict is probably safest, because the internal dict object will continue to be a regular dict, which is a highly optimized and mature implementation B.15 CHAPTER 18 Quick Check: Packages Suppose that you’re writing a package that takes a URL, retrieves all images on the page pointed to by that URL, resizes them to a standard size, and stores them. Leaving aside the exact details of how each of these functions will be coded, how would you organize those features into a package? The package will be performing three types of actions: fetching a page and parsing the HTML for image URLs, fetching the images, and resizing the images. For this reason, you might consider having three modules to keep the actions separate: picture_fetch/ init .py find.py fetch.py resize.py Lab 18: Create a package In chapter 14, you added error handling to the text cleaning and word frequency counting module you created in chapter 11. Refactor that code into a package containing one module for the cleaning functions, one for the processing functions, and one for the custom exceptions. Then write a simple main function that uses all three modules word_count init .py exceptions.py cleaning.py counter.py B.16 CHAPTER 20 Quick Check: Consider the choices Take a moment to consider your options for handling the tasks identified above. What modules in the standard library can you think of that will do the job? If you want to, you can even stop right now, work out the code to do it, and compare your solution with the one you’ll develop in the next section From the standard library, use datetime for managing the dates/times of the files, and either os.path and os or pathlib for renaming and archiving the files Quick Check: Potential Problems Because the previous solution is very simple, there are likely to be many situations that it won’t handle well. What are some potential issues or problems that might arise with the script above? How might you remedy these problems? Multiple files during the same day would be a problem, for one thing. If you have lots of files, navigating the archive directory will become increasingly difficult Consider the naming convention used for the files, which is based on the year, month and name, in that order. What advantages do you see in that convention? What might be the disadvantages? Can you make any arguments for putting the date string somewhere else in the filename, such as the beginning or the end? Using yearmonthday date formats makes a textbased sort of the files sort by date as well. Putting the date at the end of the filename but before the extension makes it more difficult to parse the date element visually Try this: Implementation of multiple directories Using the code you developed in the section above as a starting point, how would you modify it to implement archiving each set of files in subdirectories named according to the date received? Feel free to take the time to implement the code and test it import datetime import pathlib FILE_PATTERN = "*.txt" ARCHIVE = "archive" if name == ' main ': date_string = datetime.date.today().strftime("%Y%m%d") cur_path = pathlib.Path(".") new_path = cur_path.joinpath(ARCHIVE, date_string) new_path.mkdir() paths = cur_path.glob(FILE_PATTERN) for path in paths: path.rename(new_path.joinpath(path.name)) Quick Check: Alternate solutions How might you create a script that does the same thing without using pathlib? What libraries and functions would you use? You’d use the os.path and os libraries—specifically, os.path.join(), os.mkdir(), and os.rename() Try this: Archiving to zip files pseudocode Take a moment to write the pseudocode for a solution that stores data files in zip files as shown above. What modules and functions or methods do you intend to use? Try coding your solution to make sure that it works Pseudocode: create path for zip file create empty zipfile for each file write into zipfile remove original file (See the next section for sample code that does this.) Quick Check: Consider different parameters Take some time to consider different grooming options. How would you modify the code in the previous Try This to keep only one file a month? How would you change the code so that files from the previous month and older are groomed to save one a week? (Note: This is not the same as older than 30 days!) You could use something similar to the code above but also check the month of the file against the current month B.17 CHAPTER 21 Quick Check: Normalization Look closely at the list of words generated above. Do you see any issues with the normalization so far? What other issues do you think you might encounter with a longer section of text? How do you think you might deal with those issues? Double hyphens for em dashes, hyphenation for line breaks and otherwise, and any other punctuation marks would all be potential problems Enhancing the word cleaning module you created in chapter 18 would be a good way to cover most of the issues Try this: Read A file Write the code to read a text file (assume that it’s the file temp_data_00a.txt as shown in the example above), split each line of the file into a list of values, and add that list to a single list of records (no answer) What issues or problems did you encounter in implementing this solution? How might you go about converting the last three fields to the correct date, real, and int types? You could use a list comprehension to explicitly convert those fields Quick Check: Handling Quoting Consider how you’d approach the problems of handling quoted fields and embedded delimiter characters if you didn’t have the csv library. Which is easier to handle: the quoting or the embedded delimiters? Without using the csv module, you’d have to check whether a field began and ended with the quote characters and then strip() them off To handle embedded delimiters without using the csv library, you’d have to isolate the quoted fields and treat them differently; then you’d split the rest of the fields by using the delimiter Try this: Cleaning Data How would you handle the fields with 'Missing' as a possible value for math calculations? Can you write a snippet of code that averages one of those columns? clean_field = [float(x[13]) for x in data_rows if x[13] != 'Missing'] average = sum(clean_field)/len(clean_field) What would you do with the average column at the end so that you could also report the average coverage? In your opinion, would the solution to this problem be at all linked to the way that the 'Missing' entries were handled? coverage_values = [float(x[1].strip("%"))/100] It may not be done at the same time as the 'Missing' values are handled Lab: Weather observations The file of weather observations provided here is by month and then by county for the state of Illinois from 1979 to 2011. Write the code to process this file and extract the data for Chicago (Cook County) into a single CSV or spreadsheet file. This code includes replacing the 'Missing' strings with empty strings and translating the percentage to a decimal. You may also consider what fields are repetitive and can be omitted or stored elsewhere. The proof that you’ve got it right occurs when you load the file into a spreadsheet. You can download a solution with the book’s source code B.18 CHAPTER 22 Try this: Retrieving A file If you’re working with the data file above and want to break each line into separate fields, how might you do that? What other processing would you expect to do? Try writing some code to retrieve this file and calculate the average annual rainfall or, for more of a challenge, the average maximum and minimum temperature for each year import requests response = requests.get("http://www.metoffice.gov.uk/pub/data/weather /uk/climate/stationdata/heathrowdata.txt") data = response.text data_rows = [] rainfall = [] for row in data.split("\r\n")[7:]: fields = [x for x in row.split(" ") if x] data_rows.append(fields) rainfall.append(float(fields[5])) print("Average rainfall = {} mm".format(sum(rainfall)/len(rainfall))) Average rainfall = 50.43794749403351 mm Try this: Accessing an API Write some code to fetch some data from the city of Chicago site used above. Look at the fields mentioned in the results, and see whether you can select on records based on another field in combination with the date range import requests response = requests.get("https://data.cityofchicago.org/resource/ 6zsd86xi.json?$where=date between '20150110T12:00:00' and '20150110T13:00:00'&arrest=true") print(response.text) Try this: Saving some JSON crime data Modify the code you wrote to fetch Chicago crime data in section 22.2 to convert the fetched data from a JSONformatted string to a Python object. See whether you can save the crime events both as a series of separate JSON objects in one file and as one JSON object in another file. Then see what code is needed to load each file import json import requests response = requests.get("https://data.cityofchicago.org/resource/ 6zsd86xi.json?$where=date between '20150110T12:00:00' and '20150110T13:00:00'&arrest=true") crime_data = json.loads(response.text) with open("crime_all.json", "w") as outfile: json.dump(crime_data, outfile) with open("crime_series.json", "w") as outfile: for record in crime_data: json.dump(record, outfile) outfile.write("\n") with open("crime_all.json") as infile: crime_data_2 = json.load(infile) crime_data_3 = [] with open("crime_series.json") as infile: for line in infile: crime_data_3 = json.loads(line) Try this: Fetching and Parsing XML Write the code to pull the Chicago XML weather forecast from http://mng.bz/103V. Then use xmltodict to parse the XML into a Python dictionary and extract tomorrow’s forecast maximum temperature. Hint: To match up time layouts and values, compare the layoutkey value of the first timelayout section and the timelayout attribute of the temperature element of the parameters element import requests import xmltodict response = requests.get("https://graphical.weather.gov/xml/SOAP_server/ ndfdXMLclient.php?whichClient=NDFDgen&lat=41.87&lon=+87.65& product=glance") parsed_dict = xmltodict.parse(response.text) layout_key = parsed_dict['dwml']['data']['timelayout'][0]['layoutkey'] forecast_temp = parsed_dict['dwml']['data']['parameters']['temperature'][0]['value'][0] print(layout_key) print(forecast_temp) Try this: Parsing HTML Try this: Parsing HTML Given the file forecast.html (which you can find with the code on this book’s website), write a script using Beautiful Soup that extracts the data and saves it as a CSV file import csv import bs4 def read_html(filename): with open(filename) as html_file: html = html_file.read() return html def parse_html(html): bs = bs4.BeautifulSoup(html, "html.parser") labels = [x.text for x in bs.select(".forecastlabel")] forecasts = [x.text for x in bs.select(".forecasttext")] return list(zip(labels, forecasts)) def write_to_csv(data, outfilename): csv.writer(open(outfilename, "w")).writerows(data) if name == ' main ': html = read_html("forecast.html") values = parse_html(html) write_to_csv(values, "forecast.csv") print(values) Lab 22: Track Curiosity’s Weather Use the application programming interface (API) described in section 22.2 of chapter 22 to gather a weather history of Curiosity’s stay on Mars for a month. Hint: You can specify Martian days (sols) by adding ?sol=sol_number to the end of the archive query like this: http://marsweather.ingenology.com/v1/archive/?sol=155 Transform the data so that you can load it into a spreadsheet and graph it. For a version of this project, see the book’s source code import json import csv import requests for sol in range(1830, 1863): response = requests.get("http://marsweather.ingenology.com/v1/ archive/?sol={}&format=json".format(sol)) result = json.loads(response.text) if not result['count']: continue weather = result['results'][0] print(weather) csv.DictWriter(open("mars_weather.csv", "a"), list(weather.keys())).writerow B.19 CHAPTER 23 Try this: Creating and modifying Tables Using sqlite3, write the code that creates a database table for the Illinois weather data you loaded from a flat file in section 21.2 of chapter 21. Suppose that you have similar data for more states and want to store more information about the states themselves How could you modify your database to use a related table to store the state information? import sqlite3 conn = sqlite3.connect("datafile.db") cursor = conn.cursor() cursor.execute("""create table weather (id integer primary key, state text, state_code text, year_text text, year_code text, avg_max_temp real, max_temp_count integer, max_temp_low real, max_temp_high real, avg_min_temp real, min_temp_count integer, min_temp_low real, min_temp_high real, heat_index real, heat_index_count integer, heat_index_low real, heat_index_high real, heat_index_coverage text) """) conn.commit() You could add a state table and store only each state’s ID field in the weather database Try this: Using an ORM Using the database from section 22.3, write a SQLAlchemy class to map to the data table and use it to read the records from the table from sqlalchemy import create_engine, select, MetaData, Table, Column, Integer, String, Float from sqlalchemy.orm import sessionmaker dbPath = 'datafile.db' engine = create_engine('sqlite:///%s' % dbPath) metadata = MetaData(engine) weather = Table('weather', metadata, Column('id', Integer, primary_key=True), Column("state", String), Column("state_code", String), Column("year_text", String ), Column("year_code", String), Column("avg_max_temp", Float), Column("max_temp_count", Integer), Column("max_temp_low", Float), Column("max_temp_high", Float), Column("avg_min_temp", Float), Column("min_temp_count", Integer), Column("min_temp_low", Float), Column("min_temp_high", Float), Column("heat_index", Float), Column("heat_index_count", Integer), Column("heat_index_low", Float), Column("heat_index_high", Float), Column("heat_index_coverage", String) ) Session = sessionmaker(bind=engine) session = Session() result = session.execute(select([weather])) for row in result: print(row) Try this: Modifying a database with Alembic Experiment with creating an a\Alembic upgrade that adds a state table to your database, with columns for ID, state name, and abbreviation. Upgrade and downgrade What other changes would be necessary if you were going to use the state table along with the existing data table? (no answer) Quick Check: Uses of Key:Value stores What sorts of data and applications would benefit most from a key:value store like Redis? Quick lookup of data Caching Quick Check: Uses of MONGODB Thinking back over the various data samples you’ve seen so far and other types of data in your experience, can you come up with any data that you think would be well suited to being stored in a database such as MongoDB? Would others clearly not be suited, and if so, why not? Data that comes in large and/or more loosely organized chunks is suited to MongoDB, such as the contents of a web page or document Data with a specific structure is better suited to relational data. The weather data you’ve seen is a good example Lab 23: Create a database Choose one of the datasets discussed in the past few chapters, and decide which type of database would be best to store that data. Create that database, and write the code to load the data into it. Then choose the two most common and/or likely types of search criteria, and write the code to retrieve both single and multiple matching records (no answer) B.20 CHAPTER 24 Try this: Using Jupyter Notebook Try this: Using Jupyter Notebook Enter some code in the notebook, and experiment with running it. Check out the Edit, Cell, and Kernel menus to see what options are there. When you have a little code running, use the Kernel menu to restart the kernel, repeat your steps, and then use the cell menu to rerun the code in all of the cells (no answer) Try this: Cleaning Data with and without Pandas Experiment with the operations mentioned above. When the final column has been converted to a fraction, can you think of a way to convert it back to a string with the trailing percentage sign? By contrast, load the same data into a plain Python list by using the csv module, and apply the same changes by using plain Python Quick Check: Merging data sets How would you go about actually merging to data sets like the above in Python? If you’re sure that you have exactly the same number of items in each set and that the items are in the right order, you could use the zip() function. Otherwise, you could create a dictionary, with the keys being something common between the two data sets, and then append the date by key from both sets Quick Check: Selecting in Python What Python code structure would you use to select only rows that meet certain conditions? You’d probably use a list comprehension: selected = [x for x in old_list if ] Try this: Grouping and Aggregating Experiment with pandas and the data above. Can you get the calls and amounts by both team member and month? calls_revenue[['Team member','Month', 'Calls', 'Amount']] .groupby(['Team member','Month']).sum()) Try this: Plotting Plot a line graph of the monthly average amount per call %matplotlib inline import pandas as pd import numpy as np # see text for these calls = pd.read_csv("sales_calls.csv") revenue = pd.read_csv("sales_revenue.csv") calls_revenue = pd.merge(calls, revenue, on=['Territory', 'Month']) calls_revenue['Call_Amount'] = calls_revenue.Amount/calls_revenue.Calls # plot calls_revenue[['Month', 'Call_Amount']].groupby(['Month']).mean().plot()

Định dạng
Số trang	533
Dung lượng	13,75 MB