1. Trang chủ
  2. » Công Nghệ Thông Tin

Data science with python

1,8K 501 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 1.815
Dung lượng 33,51 MB

Nội dung

Python: Real-World Data Science Table of Contents Python: Real-World Data Science Meet Your Course Guide What's so cool about Data Science? Course Structure Course Journey The Course Roadmap and Timeline Course Module 1: Python Fundamentals Introduction and First Steps – Take a Deep Breath A proper introduction Enter the Python About Python Portability Coherence Developer productivity An extensive library Software quality Software integration Satisfaction and enjoyment What are the drawbacks? Who is using Python today? Setting up the environment Python versus Python – the great debate What you need for this course Installing Python Installing IPython Installing additional packages How you can run a Python program Running Python scripts Running the Python interactive shell Running Python as a service Running Python as a GUI application How is Python code organized How we use modules and packages Python's execution model Names and namespaces Scopes Guidelines on how to write good code The Python culture A note on the IDEs Object-oriented Design Introducing object-oriented Objects and classes Specifying attributes and behaviors Data describes objects Behaviors are actions Hiding details and creating the public interface Composition Inheritance Inheritance provides abstraction Multiple inheritance Case study Objects in Python Creating Python classes Adding attributes Making it something Talking to yourself More arguments Initializing the object Explaining yourself Modules and packages Organizing the modules Absolute imports Relative imports Organizing module contents Who can access my data? Third-party libraries Case study When Objects Are Alike Basic inheritance Extending built-ins Overriding and super Multiple inheritance The diamond problem Different sets of arguments Polymorphism Abstract base classes Using an abstract base class Creating an abstract base class Demystifying the magic Case study Expecting the Unexpected Raising exceptions Raising an exception The effects of an exception Handling exceptions The exception hierarchy Defining our own exceptions Case study When to Use Object-oriented Programming Treat objects as objects Adding behavior to class data with properties Properties in detail Decorators – another way to create properties Deciding when to use properties Manager objects Removing duplicate code In practice Case study Python Data Structures Empty objects Tuples and named tuples Named tuples Dictionaries Dictionary use cases Using defaultdict Counter Lists Sorting lists Sets Extending built-ins Queues FIFO queues LIFO queues Priority queues Case study Python Object-oriented Shortcuts Python built-in functions The len() function Reversed Enumerate File I/O Placing it in context An alternative to method overloading Default arguments Variable argument lists Unpacking arguments Functions are objects too Using functions as attributes Callable objects Case study Strings and Serialization Strings String manipulation String formatting Escaping braces Keyword arguments Container lookups Object lookups Making it look right Strings are Unicode Converting bytes to text Converting text to bytes Mutable byte strings Regular expressions Matching patterns Matching a selection of characters Escaping characters Matching multiple characters Grouping patterns together Getting information from regular expressions Making repeated regular expressions efficient Serializing objects Customizing pickles Serializing web objects Case study 10 The Iterator Pattern Design patterns in brief Iterators The iterator protocol Comprehensions List comprehensions Set and dictionary comprehensions Generator expressions Generators Yield items from another iterable Coroutines Back to log parsing Closing coroutines and throwing exceptions The relationship between coroutines, generators, and functions Case study 11 Python Design Patterns I The decorator pattern A decorator example Decorators in Python The observer pattern An observer example The strategy pattern A strategy example Strategy in Python The state pattern A state example State versus strategy State transition as coroutines The singleton pattern Singleton implementation The template pattern A template example 12 Python Design Patterns II The adapter pattern The facade pattern The flyweight pattern The command pattern The abstract factory pattern The composite pattern 13 Testing Object-oriented Programs Why test? Test-driven development Unit testing Assertion methods Reducing boilerplate and cleaning up Organizing and running tests Ignoring broken tests Testing with py.test One way to setup and cleanup A completely different way to set up variables Skipping tests with py.test Imitating expensive objects How much testing is enough? Case study Implementing it 14 Concurrency Threads The many problems with threads Shared memory The global interpreter lock Thread overhead Multiprocessing Multiprocessing pools Queues The problems with multiprocessing Futures AsyncIO AsyncIO in action Reading an AsyncIO future AsyncIO for networking Using executors to wrap blocking code Streams Executors Case study Course Module 2: Data Analysis Introducing Data Analysis and Libraries Data analysis and processing An overview of the libraries in data analysis Python libraries in data analysis NumPy pandas Matplotlib PyMongo The scikit-learn library NumPy Arrays and Vectorized Computation NumPy arrays Data types Array creation Indexing and slicing Fancy indexing Numerical operations on arrays Array functions Data processing using arrays Loading and saving data Saving an array Loading an array Linear algebra with NumPy NumPy random numbers Data Analysis with pandas An overview of the pandas package The pandas data structure Series The DataFrame The essential basic functionality Reindexing and altering labels Head and tail Binary operations Functional statistics Function application Sorting Indexing and selecting data Computational tools Working with missing data Advanced uses of pandas for data analysis Hierarchical indexing The Panel data Data Visualization The matplotlib API primer Line properties Figures and subplots Exploring plot types Scatter plots Bar plots Contour plots Histogram plots Legends and annotations Plotting functions with pandas Additional Python data visualization tools Bokeh S S-shaped (sigmoidal) curve about / Logistic regression intuition and conditional probabilities scatter plot about / Scatter plots scatterplot matrix / Visualizing the important characteristics of a dataset scikit-learn about / First steps with scikit-learn perceptron, training via / Training a perceptron via scikit-learn reference link / Kernel principal component analysis in scikit-learn scikit-learn estimator API about / Understanding the scikit-learn estimator API scikit-learn estimators defining / scikit-learn estimators fit() / scikit-learn estimators predict() / scikit-learn estimators Nearest neighbors / Nearest neighbors distance metrics / Distance metrics dataset, loading / Loading the dataset standard workflow, defining / Moving towards a standard workflow fit() function / Moving towards a standard workflow predict() function / Moving towards a standard workflow algorithm, running / Running the algorithm parameters, setting / Setting parameters scikit-learn library about / The scikit-learn library scikit-learn online documentation URL / Training a perceptron via scikit-learn Scikit-learn tutorials URL / Scikit-learn tutorials scopes about / Scopes local / Scopes enclosing / Scopes global / Scopes built-in / Scopes self-posts about / Reddit as a data source self argument / Talking to yourself sepal length / Loading and preparing the dataset sepal width / Loading and preparing the dataset about / Combining different algorithms for classification with majority vote sepal width feature / Decision tree learning sequence diagram about / Case study Sequential Backward Selection (SBS) about / Sequential feature selection algorithms sequential feature selection algorithms about / Sequential feature selection algorithms Series about / Series service Python, running as / Running Python as a service sets about / Sets sigmoid function about / Logistic regression intuition and conditional probabilities Silhouette Coefficient about / Optimizing criteria computing / Optimizing criteria parameters / Optimizing criteria Similarity graph creating / Creating a similarity graph simple command-line notebook application building / Case study simple linear regression about / Introducing a simple linear regression model simple linear regression model about / Introducing a simple linear regression model simple majority vote classifier implementing / Implementing a simple majority vote classifier different algorithms, combining with majority vote / Combining different algorithms for classification with majority vote singleton pattern about / The singleton pattern implementing / Singleton implementation slots about / Empty objects softmax nonlinearity about / An introduction to Lasagne spam filter about / Evaluation using the F1-score Spark about / An overview of the libraries in data analysis reference / An overview of the libraries in data analysis sparse matrix about / Distance metrics sparse matrix format about / Sparse data formats sports outcome prediction about / Sports outcome prediction features / Sports outcome prediction stacking about / Putting it all together / Evaluating and tuning the ensemble classifier StackOverflow question URL / More on pandas stacks / LIFO queues standardization about / Bringing features onto the same scale, Streamlining workflows with pipelines standings loading / Putting it all together standings data obtaining / Putting it all together URL / Putting it all together state pattern about / The state pattern example / A state example versus, strategy pattern / State versus strategy state transition as coroutines / State transition as coroutines statistics functions / Functional statistics Statsmodels about / An overview of the libraries in data analysis Stochastic Gradient Descent (SGD) / Solving regression for regression parameters with gradient descent strategy pattern about / The strategy pattern User code / The strategy pattern Abstraction interface / The strategy pattern example / A strategy example using, in Python / Strategy in Python Stratified K Fold about / Running the algorithm string formatting about / String formatting braces, escaping / Escaping braces keyword arguments / Keyword arguments container lookups / Container lookups object lookups / Object lookups string manipulation about / String manipulation strings about / Strings, Strings are Unicode bytes, converting to text / Converting bytes to text text, converting to bytes / Converting text to bytes mutable byte strings / Mutable byte strings strong learner / Combining weak to strong learners via random forests style sheets about / Extracting text from arbitrary websites stylometry about / Attributing documents to authors subgraphs finding / Finding subgraphs connected components / Connected components criteria, optimizing / Optimizing criteria subreddits about / Obtaining news articles, Reddit as a data source Sum of Squared Errors (SSE) / Solving regression for regression parameters with gradient descent sum of the squared errors (SSE) / Sparse solutions with L1 regularization supervised data compression, via linear discriminant analysis about / Supervised data compression via linear discriminant analysis scatter matrices, computing / Computing the scatter matrices linear discriminants, selecting for new feature subspace / Selecting linear discriminants for the new feature subspace samples, projecting onto new feature space / Projecting samples onto the new feature space supervised learning about / Making predictions about the future with supervised learning predictions, making with / Making predictions about the future with supervised learning classification, for predicting class labels / Classification for predicting class labels regression, for predicting continuous outcomes / Regression for predicting continuous outcomes support / Implementing a simple ranking of rules Support Vector Machine (SVM) about / Using kernel principal component analysis for nonlinear mappings, Random forest regression support vector machine (SVM) about / Maximum margin classification with support vector machines / Tuning hyperparameters via grid search support vector machines (SVM) about / scikit-learn estimators support vectors about / Maximum margin classification with support vector machines SVMs about / Support vector machines URL / Support vector machines classifying with / Classifying with SVMs kernels / Kernels SyntaxError / Initializing the object system building, for taking image as input / Application scenario and goals T Tcl (Tool Command Language) / Running Python as a GUI application template pattern about / The template pattern example / A template example test need for / Why test? test-driven development about / Test-driven development case study / Case study, Implementing it text about / Disambiguation extracting, from arbitrary websites / Extracting text from arbitrary websites text method about / Legends and annotations text transformers defining / Text transformers word, counting in dataset / Bag-of-words bag-of-words model / Bag-of-words n-grams / N-grams features / Other features tf-idf about / Bag-of-words Theano about / An overview of the libraries in data analysis, An introduction to Theano using / An introduction to Theano URL / Running our code on a GPU third-party libraries about / Third-party libraries threads about / Threads pitfalls / The many problems with threads thread overhead / Thread overhead Timedeltas about / Timedeltas time series reference, Pandas documentation / Working with date and time objects resampling / Resampling time series plotting / Time series plotting time series data downsampling / Downsampling time series data unsampling / Upsampling time series data time series primer about / Time series primer time zone handling about / Upsampling time series data Tkinter / Running Python as a GUI application Torch URL / Keras and Pylearn2 train_feature_value() function about / Implementing the OneR algorithm transformer creating / Creating your own transformer API / The transformer API implementing / Implementation details unit testing / Unit testing transformer classes / Understanding the scikit-learn estimator API transformers and estimators combining, in pipeline / Combining transformers and estimators in a pipeline true positive rate (TPR) / Optimizing the precision and recall of a classification model tuples about / Tuples and named tuples named tuples / Named tuples tutorial, Google URL / Courses on Hadoop tutorial, Yahoo URL / Courses on Hadoop tweet about / Disambiguation tweets loading / Putting it all together F1-score, used for evaluation / Evaluation using the F1-score features, obtaining from models / Getting useful features from models Twitter follower information, obtaining from / Getting follower information from Twitter Twitter account URL / Downloading data from a social network twitter documentation URL / Downloading data from a social network U UCL Machine Learning data repository URL / Loading the dataset UDP (User Datagram Protocol) / AsyncIO for networking UML sequence diagram about / Case study underfitting about / Tackling overfitting via regularization Unified Modeling Language (UML) about / Objects and classes example / Objects and classes unit tests about / Unit testing assertion methods / Assertion methods boilerplate, reducing / Reducing boilerplate and cleaning up organizing / Organizing and running tests running / Organizing and running tests broken tests, ignoring / Ignoring broken tests univariate feature about / Selecting the best individual features unstructured format about / Disambiguation unsupervised dimensionality reduction, via principal component analysis about / Unsupervised dimensionality reduction via principal component analysis total variance / Total and explained variance explained variance / Total and explained variance feature transformation / Feature transformation unsupervised learning about / Discovering hidden structures with unsupervised learning hidden structures, discovering with / Discovering hidden structures with unsupervised learning subgroups, finding with clustering / Finding subgroups with clustering dimensionality reduction, for data compression / Dimensionality reduction for data compression use cases, computer vision about / Use cases V V's, big data volume / Big data velocity / Big data variety / Big data veracity / Big data validation curves about / Debugging algorithms with learning and validation curves overfitting and underfitting, addressing with / Addressing overfitting and underfitting with validation curves validation dataset about / Sequential feature selection algorithms value / Dictionaries values about / Behaviors are actions variance about / How ensembles work? vectorization about / Implementing a perceptron learning algorithm in Python virtualenv URL / Setting up the environment visualization toolkit (VTK) / MayaVi vocabulary about / Counting function words Vowpal Wabbit about / An overview of the libraries in data analysis, Vowpal Wabbit reference / An overview of the libraries in data analysis URL / Vowpal Wabbit W weak learners / Combining weak to strong learners via random forests leveraging, via adaptive boosting / Leveraging weak learners via adaptive boosting about / Leveraging weak learners via adaptive boosting web-based API, considerations authorization methods / Using a Web API to get data rate limiting / Using a Web API to get data API Endpoints / Using a Web API to get data web objects serializing / Serializing web objects weight about / An introduction to neural networks weighted edge about / Creating a similarity graph Weka about / An overview of the libraries in data analysis reference / An overview of the libraries in data analysis Wine dataset about / Partitioning a dataset in training and test sets, Bagging – building an ensemble of classifiers from bootstrap samples URL / Partitioning a dataset in training and test sets features / Partitioning a dataset in training and test sets Hue class / Bagging – building an ensemble of classifiers from bootstrap samples Alcohol class / Bagging – building an ensemble of classifiers from bootstrap samples workflows streamlining, with pipelines / Streamlining workflows with pipelines X 5x2 cross-validation / Algorithm selection with nested cross-validation xmlstarlet tool / Data munging Z 7-zip URL / Accessing the Enron dataset ... with data in MongoDB Interacting with data in Redis The simple value List Set Ordered set Data Analysis Application Examples Data munging Cleaning data Filtering Merging data Reshaping data Data... Variables with Regression Analysis B Bibliography Index Python: Real-World Data Science Python: Real-World Data Science A course in four modules Unleash the power of Python and its robust data science. .. series data Timedeltas Time series plotting Interacting with Databases Interacting with data in text format Reading data from text format Writing data to text format Interacting with data in

Ngày đăng: 02/03/2019, 10:19

TỪ KHÓA LIÊN QUAN