Introduction to Python for Econometrics, Statistics and Data Analysis Kevin Sheppard University of Oxford Tuesday 5th August, 2014 - ©2012, 2013, 2014 Kevin Sheppard Changes since the Second Edition Version 2.2.1 (August 2014) • Fixed typos reported by a reader – thanks to Ilya Sorvachev Version 2.2 (July 2014) • Code verified against Anaconda 2.0.1 • Added diagnostic tools and a simple method to use external code in the Cython section • Updated the Numba section to reflect recent changes • Fixed some typos in the chapter on Performance and Optimization • Added examples of joblib and IPython’s cluster to the chapter on running code in parallel Version 2.1 (February 2014) • New chapter introducing object oriented programming as a method to provide structure and organization to related code • Added seaborn to the recommended package list, and have included it be default in the graphics chapter • Based on experience teaching Python to economics students, the recommended installation has been simplified by removing the suggestion to use virtual environment The discussion of virtual environments as been moved to the appendix • Rewrote parts of the pandas chapter • Code verified against Anaconda 1.9.1 Version 2.02 (November 2013) • Changed the Anaconda install to use both create and install, which shows how to install additional packages • Fixed some missing packages in the direct install • Changed the configuration of IPython to reflect best practices • Added subsection covering IPython profiles i Version 2.01 (October 2013) • Updated Anaconda to 1.8 and added some additional packages to the installation for Spyder • Small section about Spyder as a good starting IDE ii Notes to the 2nd Edition This edition includes the following changes from the first edition (March 2012): • The preferred installation method is now Continuum Analytics’ Anaconda Anaconda is a complete scientific stack and is available for all major platforms • New chapter on pandas pandas provides a simple but powerful tool to manage data and perform basic analysis It also greatly simplifies importing and exporting data • New chapter on advanced selection of elements from an array • Numba provides just-in-time compilation for numeric Python code which often produces large performance gains when pure NumPy solutions are not available (e.g looping code) • Dictionary, set and tuple comprehensions • Numerous typos • All code has been verified working against Anaconda 1.7.0 iii iv Contents Introduction 1.1 Background 1.2 Conventions 1.3 Important Components of the Python Scientific Stack 1.4 Setup 1.5 Using Python 1.6 Exercises 1.A Frequently Encountered Problems 17 1.B register_python.py 18 1.C Advanced Setup 19 17 Python 2.7 vs (and the rest) 27 2.1 Python 2.7 vs 27 2.2 Intel Math Kernel Library and AMD Core Math Library 27 2.3 Other Variants 28 2.A Relevant Differences between Python 2.7 and 29 Built-in Data Types 31 3.1 Variable Names 31 3.2 Core Native Data Types 32 3.3 Python and Memory Management 42 3.4 Exercises 44 Arrays and Matrices 47 4.1 Array 47 4.2 Matrix 49 4.3 1-dimensional Arrays 50 4.4 2-dimensional Arrays 51 4.5 Multidimensional Arrays 51 4.6 Concatenation 51 4.7 Accessing Elements of an Array 52 4.8 Slicing and Memory Management 57 v 4.9 import and Modules 59 4.10 Calling Functions 4.11 Exercises 59 61 Basic Math 63 5.1 Operators 63 5.2 Broadcasting 64 5.3 Array and Matrix Addition (+) and Subtraction (-) 65 5.4 Array Multiplication (*) 66 5.5 Matrix Multiplication (*) 66 5.6 Array and Matrix Division (/) 66 5.7 Array Exponentiation (**) 66 5.8 Matrix Exponentiation (**) 67 5.9 Parentheses 67 5.10 Transpose 67 5.11 Operator Precedence 67 5.12 Exercises Basic Functions and Numerical Indexing 71 6.1 Generating Arrays and Matrices 71 6.2 Rounding 6.3 Mathematics 75 6.4 Complex Values 77 6.5 Set Functions 77 6.6 Sorting and Extreme Values 78 6.7 Nan Functions 80 6.8 Functions and Methods/Properties 6.9 Exercises 82 Special Arrays 83 7.1 68 Exercises 74 81 84 Array and Matrix Functions 85 85 8.1 Views 8.2 Shape Information and Transformation 8.3 Linear Algebra Functions 8.4 Exercises 86 93 96 Importing and Exporting Data 99 99 9.1 Importing Data using pandas 9.2 Importing Data without pandas 9.3 Saving or Exporting Data using pandas 106 100 vi 9.4 Saving or Exporting Data without pandas 106 9.5 Exercises 107 10 Inf, NaN and Numeric Limits 109 10.1 inf and NaN 109 10.2 Floating point precision 10.3 Exercises 109 110 11 Logical Operators and Find 113 11.1 >, >=, =, 113 intersect1d, 78 all, 115 inv, 95 and, 114 ix_, 73 any, 115 equal, 113 join, 255, 257 greater, 113 greater_equal, 113 kendalltau, 239 less, 113 kron, 95 less_equal, 113 ks_2samp, 240 logical_and, 114 kstest, 240 logical_not, 114 kurtosis, 238 logical_or, 114 laplace, 227 logical_xor, 114 leastsq, 253 not, 114 less, 113 not_equal, 113 less_equal, 113 Linear Algebra cholesky, 94 cond, 93 det, 94 eig, 94 eigh, 95 eigvals, 94 inv, 95 kron, 95 lstsq, 94 or, 114 logical_and, 114 logical_not, 114 logical_or, 114 logical_xor, 114 lognormal, 227 logspace, 71 matrix_power, 93 Looping, 134–139 break, 137, 138 continue, 137, 139 for, 134 while, 137 matrix_rank, 95 Looping slogdet, 93 Whitespace, 133 solve, 93 lower, 258 svd, 93 lstrip, 257 trace, 95 lstsq, 94 linregress, 239 linspace, 71 List comprehensions, 139 ljust, 258 loadtxt, 101 log, 76 log10, 76 Logical > x = 1.0 >>> type(x) float >>> x = 1j >>> type(x) complex >>> x = + 3j >>> x (2 +3j) >>> x = complex( 1) >>> x (1 +0j) Note that a +b j is the same as complex(a ,b ), while complex(a... bool( 0) >>> x False Non-zero, non-empty values generally evaluate to true when evaluated by bool () Zero or empty values such as bool( 0), bool(0. 0), bool(0.0j), bool(None), bool(’? ?) and bool([ ]). .. converted into a tuple using tuple () (Similarly, a tuple can be converted to list using list () ) >>> x =(0 ,1,2,3,4,5,6,7,8, 9) >>> type(x) tuple >>> x[0] >>> x[-10:-5] (0 , 1, 2, 3, 4) >>> x = list(x) >>>