1. Trang chủ
  2. » Công Nghệ Thông Tin

R in action

474 674 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 474
Dung lượng 12,4 MB

Nội dung

IN ACTION Data analysis and graphics with R Robert I Kabacoff MANNING www.it-ebooks.info R in Action www.it-ebooks.info www.it-ebooks.info R in Action Data analysis and graphics with R ROBERT I KABACOFF MANNING Shelter Island www.it-ebooks.info For online information and ordering of this and other Manning books, please visit www.manning.com The publisher offers discounts on this book when ordered in quantity For more information, please contact Special Sales Department Manning Publications Co 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: orders@manning.com ©2011 by Manning Publications Co All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine Manning Publications Co 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Development editor: Sebastian Stirling Copyeditor: Liz Welch Typesetter: Composure Graphics Cover designer: Marija Tudor ISBN: 9781935182399 Printed in the United States of America 10 MAL 16 15 14 13 12 11 www.it-ebooks.info brief contents Part I Getting started 1 Part II ■ ■ ■ ■ ■ Introduction to R Creating a dataset 21 Getting started with graphs 45 Basic data management 73 Advanced data management 91 Basic methods 117 ■ ■ Basic graphs 119 Basic statistics 141 Part III Intermediate methods 171 10 11 12 ■ ■ ■ ■ ■ Regression 173 Analysis of variance 219 Power analysis 246 Intermediate graphs 263 Resampling statistics and bootstrapping v www.it-ebooks.info 291 vi BRIEF CONTENTS Part IV Advanced methods 311 13 14 15 16 ■ ■ ■ ■ Generalized linear models 313 Principal components and factor analysis 331 Advanced methods for missing data 352 Advanced graphics 373 www.it-ebooks.info contents preface xv acknowledgments xvii about this book xix about the cover illustration Part I Getting started .1 Introduction to R 1.1 1.2 1.3 xxiv Why use R? Obtaining and installing R Working with R 7 Getting started Getting help 11 Input and output 13 ■ 1.4 Packages The workspace 11 14 What are packages? 15 Loading a package 16 1.5 1.6 1.7 ■ ■ ■ Installing a package 16 Learning about a package 16 Batch processing 17 Using output as input—reusing results Working with large datasets 18 vii www.it-ebooks.info 18 viii CONTENTS 1.8 1.9 Working through an example Summary 20 Creating a dataset 2.1 2.2 2.3 21 Understanding datasets Data structures 23 Vectors 24 Factors 30 Data input 18 ■ ■ 22 Matrices 24 Lists 32 ■ Arrays 26 ■ Data frames 27 33 Entering data from the keyboard 34 Importing data from a delimited text file 35 Importing data from Excel 36 Importing data from XML 37 Webscraping 37 Importing data from SPSS 38 Importing data from SAS 38 Importing data from Stata 38 Importing data from netCDF 39 Importing data from HDF5 39 Accessing database management systems (DBMSs) 39 Importing data via Stat/Transfer 41 ■ ■ ■ ■ ■ ■ ■ ■ 2.4 Annotating datasets Variable labels 42 2.5 2.6 42 Value labels 42 ■ Useful functions for working with data objects Summary 43 Getting started with graphs 3.1 3.2 3.3 45 Working with graphs 46 A simple example 48 Graphical parameters 49 Symbols and lines 50 Colors 52 Graph and margin dimensions 54 ■ 3.4 ■ Text characteristics 53 Adding text, customized axes, and legends Titles 57 Axes 57 Text annotations 62 ■ 3.5 Combining graphs ■ Reference lines 60 ■ Summary 65 71 Basic data management 4.1 4.2 4.3 73 A working example 73 Creating new variables 75 Recoding variables 76 www.it-ebooks.info 56 Legend 60 Creating a figure arrangement with fine control 69 3.6 42 ix CONTENTS 4.4 4.5 Renaming variables Missing values 79 78 Recoding values to missing 80 4.6 Date values Excluding missing values from analyses 80 ■ 81 Converting dates to character variables 83 4.7 4.8 4.9 Going further 83 Type conversions 83 Sorting data 84 Merging datasets 85 Adding columns 85 4.10 ■ ■ Adding rows 85 Subsetting datasets 86 Selecting (keeping) variables 86 Excluding (dropping) variables 86 Selecting observations 87 The subset() function 88 Random samples 89 ■ ■ 4.11 4.12 Using SQL statements to manipulate data frames Summary 90 Advanced data management 5.1 5.2 ■ 91 A data management challenge 92 Numerical and character functions 93 Mathematical functions 93 Statistical functions 94 Character functions 99 Other useful functions 101 matrices and data frames 102 ■ ■ 5.3 5.4 A solution for our data management challenge Control flow 107 Repetition and looping 107 5.5 5.6 Part II ■ ■ ■ Probability functions 96 Applying functions to 103 Conditional execution 108 112 Aggregating data 112 ■ The reshape package 113 Summary 116 Basic methods 117 Basic graphs 6.1 ■ User-written functions 109 Aggregation and restructuring Transpose 112 5.7 119 Bar plots 120 Simple bar plots 120 Stacked and grouped bar plots 121 Tweaking bar plots 123 Spinograms 124 ■ ■ 6.2 6.3 89 Pie charts 125 Histograms 128 www.it-ebooks.info ■ Mean bar plots 122 index Symbol ! operator 77 != operator 77 # symbol %a symbol 81 %A symbol 81 %B symbol 82 %b symbol 82 %d symbol 81 %m symbol 81 %Y symbol 82 %y symbol 82 * operator 75, 178 ** operator 75 option 58, 61 symbol 178 / operator 75 : symbol 178 ? function 11 ?? function 11 ^ operator 75, 178, 181 ~ symbol 178 + operator 75, 178 < operator 77 [...]... Scenarios for using OLS regression 175 8.2 When there are more than two Comparing more than two groups 168 Visualizing group differences Summary 170 Regression 8.1 ■ ■ OLS regression ■ What you need to know 176 177 Fitting regression models with lm() 178 Simple linear regression 179 Polynomial regression 181 Multiple linear regression 184 Multiple linear regression with interactions 186 ■ ■ 8.3 Regression... I remain solely responsible for any errors or distortions inadvertently included in this book I really should have started this book by thanking my wife and partner, Carol Lynn Although she has no intrinsic interest in statistics or programming, she read each chapter multiple times and made countless corrections and suggestions No greater love has any person than to read multivariate statistics for... blue-collar jobs involve lower education, income, and prestige, whereas professional jobs involve higher education, income, and prestige White-collar jobs fall in between 20 40 60 80 100 80 RR.engineer 40 60 income bc prof wc 100 20 minister 40 60 80 education RR engineer 100 20 RR engineer minister 80 prestige 0 20 40 60 RR.engineer 20 40 60 80 0 20 40 60 80 100 Figure 1.2 Relationships between income,... useful in preparing data for further analyses After having completed part 1, you will be thoroughly familiar with programming in the R environment You will have the skills needed to enter and access data, clean it up, and prepare it for further analyses You will also have experience creating, customizing, and saving a variety of graphs www.it-ebooks.info 1 Introduction to R This chapter covers ■ ■ ■ Installing... education, and prestige for blue-collar (bc), white-collar (wc), and professional jobs (prof) Source: car package (scatterplotMatrix function) written by John Fox Graphs like this are difficult to create in other statistical programming languages but can be created with a line or two of code in R www.it-ebooks.info Working with R ■ ■ 7 There are some interesting exceptions Railroad Engineers have high income... methods of creating graphs, modifying them, and saving them in a variety of formats Chapter 4 covers basic data management, including sorting, merging, and subsetting datasets, and transforming, recoding, and deleting variables Building on the material in chapter 4, chapter 5 covers the use of functions (mathematical, statistical, character) and control structures (looping, conditional execution) for data... and a review of methods for interacting with graphs in real time The afterword points you to many of the best internet sites for learning more about R, joining the R community, getting questions answered, and staying current with this rapidly changing product Last, but not least, the eight appendices (A through H) extend the text’s coverage to include such useful topics as R graphic user interfaces,... Management Research Group, an international organizational development and consulting firm He has more than 20 years of experience providing research and statistical consultation to organizations in health care, financial services, manufacturing, behavioral sciences, government, and academia Prior to joining MRG, Dr Kabacoff was a professor of psychology at Nova Southeastern University in Florida, where he... new skills Once you’re familiar with the R interface, the next challenge is to get your data into the program In today’s information-rich world, data can come from many sources and in many formats Chapter 2 covers the wide variety of methods available for importing data into R The first half of the chapter introduces the data structures R uses to hold data and describes how to input data manually The... discussion of generalized linear models and then focuses on cases where you’re trying to predict an outcome variable that is either categorical (logistic regression) or a count (Poisson regression) One of the challenges of multivariate data problems is simplification Chapter 14 describes methods of transforming a large number of correlated variables into a smaller set of uncorrelated variables (principal component ... installing the program 1.2 Obtaining and installing R R is freely available from the Comprehensive R Archive Network (CRAN) at http:// cran .r- project.org Precompiled binaries are available for Linux, Mac... the current working directory setwd("mydirectory") Change the current working directory to mydirectory ls() List the objects in the current workspace rm(objectlist) Remove (delete) one or more objects... tasks in R, including sorting, merging, and subsetting datasets, and transforming, recoding, and deleting variables Chapter builds on the material in chapter It covers the use of numeric (arithmetic,

Ngày đăng: 04/12/2015, 01:15

Xem thêm

TỪ KHÓA LIÊN QUAN