Gergely daroczi mastering data analysis with r packt publishing (2015)

397 275 0
Gergely daroczi mastering data analysis with r packt publishing (2015)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Mastering Data Analysis with R Gain clear insights into your data and solve real-world data science problems with R – from data munging to modeling and visualization Gergely Daróczi BIRMINGHAM - MUMBAI Mastering Data Analysis with R Copyright © 2015 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information First published: September 2015 Production reference: 1280915 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78398-202-8 www.packtpub.com Credits Author Gergely Daróczi Copy Editors Stephen Copestake Angad Singh Reviewers Krishna Gawade Project Coordinator Alexey Grigorev Sanchita Mandal Mykola Kolisnyk Mzabalazo Z Ngwenya Mohammad Rafi Commissioning Editor Akram Hussain Acquisition Editor Meeta Rajani Content Development Editor Nikhil Potdukhe Technical Editor Mohita Vyas Proofreader Safis Editing Indexer Tejal Soni Graphics Jason Monteiro Production Coordinator Manu Joseph Cover Work Manu Joseph About the Author Gergely Daróczi is a former assistant professor of statistics and an enthusiastic R user and package developer He is the founder and CTO of an R-based reporting web application at http://rapporter.net and a PhD candidate in sociology He is currently working as the lead R developer/research data scientist at https://www.card.com/ in Los Angeles Besides maintaining around half a dozen R packages, mainly dealing with reporting, Gergely has coauthored the books Introduction to R for Quantitative Finance and Mastering R for Quantitative Finance (both by Packt Publishing) by providing and reviewing the R source code He has contributed to a number of scientific journal articles, mainly in social sciences but in medical sciences as well I am very grateful to my family, including my wife, son, and daughter, for their continuous support and understanding, and for missing me while I was working on this book—a lot more than originally planned I am also very thankful to Renata Nemeth and Gergely Toth for taking over the modeling chapters Their professional and valuable help is highly appreciated David Gyurko also contributed some interesting topics and preliminary suggestions to this book And last but not least, I received some very useful feedback from the official reviewers and from Zoltan Varju, Michael Puhle, and Lajos Balint on a few chapters that are highly related to their field of expertise—thank you all! About the Reviewers Krishna Gawade is a data analyst and senior software developer with Saint- Gobain's S.A IT development center Krishna discovered his passion for computer science and data analysis while at Mumbai University where he holds a bachelor's degree in computer science He has been awarded multiple times from Saint-Gobain for his contribution on various data driven projects He has been a technical reviewer on R Data Analysis Cookbook (ISBN: 9781783989065) His current interests are data analysis, statistics, machine learning, and artificial intelligence He can be reached at gawadesk@gmail.com, or you can follow him on Twitter at @gawadesk Alexey Grigorev is an experienced software developer and data scientist with five years of professional experience In his day-to-day job, he actively uses R and Python for data cleaning, data analysis, and modeling Mykola Kolisnyk has been involved in test automation since 2004 through various activities, including creating test automation solutions from the scratch, leading test automation teams, and performing consultancy regarding test automation processes In his career, he has had experience of different test automation tools, such as Mercury WinRunner, MicroFocus SilkTest, SmartBear TestComplete, Selenium-RC, WebDriver, Appium, SoapUI, BDD frameworks, and many other engines and solutions Mykola has experience with multiple programming technologies based on Java, C#, Ruby, and more He has worked for different domain areas, such as healthcare, mobile, telecommunications, social networking, business process modeling, performance and talent management, multimedia, e-commerce, and investment banking He has worked as a permanent employee at ISD, GlobalLogic, Luxoft, and Trainline.com He also has experience in freelancing activities and was invited as an independent consultant to introduce test automation approaches and practices to external companies Currently, he works as a mobile QA developer at the Trainline.com Mykola is one of the authors (together with Gennadiy Alpaev) of the online SilkTest Manual (http://silktutorial.ru/) and participated in the creation of the TestComplete tutorial at http://tctutorial.ru/, which is one of the biggest related documentation available at RU.net Besides this, he participated as a reviewer on TestComplete Cookbook (ISBN: 9781849693585) and Spring Batch Essentials, Packt Publishing (ISBN: 9781783553372) Mzabalazo Z Ngwenya holds a postgraduate degree in mathematical statistics from the University of Cape Town He has worked extensively in the field of statistical consulting and currently works as a biometrician at a research and development entity in South Africa His areas of interest are primarily centered around statistical computing, and he has over 10 years of experience with the use of R for data analysis and statistical research Previously, he was involved in reviewing Learning RStudio for R Statistical Computing, Mark P.J van der Loo and Edwin de Jonge; R Statistical Application Development by Example Beginner's Guide, Prabhanjan Narayanachar Tattar; R Graph Essentials, David Alexandra Lillis; R Object-oriented Programming, Kelly Black; and Mastering Scientific Computing with R, Paul Gerrard and Radia Johnson All of these were published by Packt Publishing Mohammad Rafi is a software engineer who loves data analytics, programming, and tinkering with anything he can get his hands on He has worked on technologies such as R, Python, Hadoop, and JavaScript He is an engineer by day and a hardcore gamer by night He was one of the reviewers on R for Data Science Mohammad has more than years of highly diversified professional experience, which includes app development, data processing, search expert, and web data analytics He started with a web marketing company Since then, he has worked with companies such as Hindustan Times, Google, and InMobi www.PacktPub.com Support files, eBooks, discount offers, and more For support files and downloads related to your book, please visit www.PacktPub.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks TM https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library Here, you can search, access, and read Packt's entire library of books Why subscribe? • Fully searchable across every book published by Packt • Copy and paste, print, and bookmark content • On demand and accessible via a web browser Free access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view entirely free books Simply use your login credentials for immediate access Table of Contents Preface vii Chapter 1: Hello, Data! Loading text files of a reasonable size Data files larger than the physical memory Benchmarking text file parsers Loading a subset of text files Filtering flat files before loading to R Loading data from databases 10 Setting up the test environment 11 MySQL and MariaDB 15 PostgreSQL 20 Oracle database 22 ODBC database access 29 Using a graphical user interface to connect to databases 32 Other database backends 33 Importing data from other statistical systems 35 Loading Excel spreadsheets 35 Summary 36 Chapter 2: Getting Data from the Web Loading datasets from the Internet Other popular online data formats Reading data from HTML tables Reading tabular data from static Web pages Scraping data from other online sources R packages to interact with data source APIs Socrata Open Data API Finance APIs Fetching time series with Quandl [i] 37 38 42 48 49 51 55 55 57 59 ...Mastering Data Analysis with R Gain clear insights into your data and solve real-world data science problems with R – from data munging to modeling and visualization Gergely Daróczi BIRMINGHAM... online data formats Reading data from HTML tables Reading tabular data from static Web pages Scraping data from other online sources R packages to interact with data source APIs Socrata Open Data. .. Summarizing Data 65 Chapter 4: Restructuring Data 85 Drop needless data Drop needless data in an efficient way Drop needless data in another efficient way Aggregation Quicker aggregation with base

Ngày đăng: 13/04/2019, 01:30

Từ khóa liên quan

Mục lục

  • Cover

  • Copyright

  • Credits

  • About the Author

  • About the Reviewers

  • www.PacktPub.com

  • Table of Contents

  • Preface

  • Chapter 1: Hello, Data!

    • Loading text files of a reasonable size

      • Data files larger than the physical memory

      • Benchmarking text file parsers

      • Loading a subset of text files

        • Filtering flat files before loading to R

        • Loading data from databases

          • Setting up the test environment

          • MySQL and MariaDB

          • PostgreSQL

          • Oracle database

          • ODBC database access

          • Using a graphical user interface to connect to databases

          • Other database backends

          • Importing data from other statistical systems

          • Loading Excel spreadsheets

Tài liệu cùng người dùng

Tài liệu liên quan