Learn how to turn data into decisions From startups to the Fortune 500, smart companies are betting on data-driven insight, seizing the opportunities that are emerging from the convergence of four powerful trends: New methods of collecting, managing, and analyzing data n Cloud computing that offers inexpensive storage and flexible, on-demand computing power for massive data sets n Visualization techniques that turn complex data into images that tell a compelling story n n Tools that make the power of data available to anyone Get control over big data and turn it into insight with O’Reilly’s Strata offerings Find the inspiration and information to create new products or revive existing ones, understand customer behavior, and get the data edge Visit oreilly.com/data to learn more ©2011 O’Reilly Media, Inc O’Reilly logo is a registered trademark of O’Reilly Media, Inc Learning R Richard Cotton Learning R by Richard Cotton Copyright © 2013 Richard Cotton All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://my.safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Meghan Blanchette Production Editor: Kristen Brown Copyeditor: Rachel Head Proofreader: Jilly Gagnon September 2013: Indexer: WordCo Indexing Services Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Rebecca Demarest First Edition Revision History for the First Edition: 2013-09-06: First release See http://oreilly.com/catalog/errata.csp?isbn=9781449357108 for release details Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc Learning R, the image of a roe deer, and related trade dress are trademarks of O’Reilly Media, Inc Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein ISBN: 978-1-449-35710-8 [LSI] Table of Contents Preface xiii Part I The R Language Introduction Chapter Goals What Is R? Installing R Choosing an IDE Emacs + ESS Eclipse/Architect RStudio Revolution-R Live-R Other IDEs and Editors Your First Program How to Get Help in R Installing Extra Related Software Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 3 5 6 7 8 11 11 12 12 A Scientific Calculator 13 Chapter Goals Mathematical Operations and Vectors Assigning Variables Special Numbers Logical Vectors Summary 13 13 17 19 20 22 v Test Your Knowledge: Quiz Test Your Knowledge: Exercises 22 23 Inspecting Variables and Your Workspace 25 Chapter Goals Classes Different Types of Numbers Other Common Classes Checking and Changing Classes Examining Variables The Workspace Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 25 25 26 27 30 33 36 37 37 37 Vectors, Matrices, and Arrays 39 Chapter Goals Vectors Sequences Lengths Names Indexing Vectors Vector Recycling and Repetition Matrices and Arrays Creating Arrays and Matrices Rows, Columns, and Dimensions Row, Column, and Dimension Names Indexing Arrays Combining Matrices Array Arithmetic Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 39 39 41 42 42 43 45 46 46 48 50 51 51 52 54 55 55 Lists and Data Frames 57 Chapter Goals Lists Creating Lists Atomic and Recursive Variables List Dimensions and Arithmetic Indexing Lists Converting Between Vectors and Lists vi | Table of Contents 57 57 57 60 60 61 64 Combining Lists NULL Pairlists Data Frames Creating Data Frames Indexing Data Frames Basic Data Frame Manipulation Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 65 66 70 70 71 74 75 77 77 78 Environments and Functions 79 Chapter Goals Environments Functions Creating and Calling Functions Passing Functions to and from Other Functions Variable Scope Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 79 79 82 82 86 89 91 91 91 Strings and Factors 93 Chapter Goals Strings Constructing and Printing Strings Formatting Numbers Special Characters Changing Case Extracting Substrings Splitting Strings File Paths Factors Creating Factors Changing Factor Levels Dropping Factor Levels Ordered Factors Converting Continuous Variables to Categorical Converting Categorical Variables to Continuous Generating Factor Levels Combining Factors Summary 93 93 94 95 97 98 98 99 100 101 101 103 103 104 105 106 107 107 108 Table of Contents | vii Test Your Knowledge: Quiz Test Your Knowledge: Exercises 108 108 Flow Control and Loops 111 Chapter Goals Flow Control if and else Vectorized if Multiple Selection Loops repeat Loops while Loops for Loops Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 111 111 112 114 115 116 116 118 120 122 122 122 Advanced Looping 125 Chapter Goals Replication Looping Over Lists Looping Over Arrays Multiple-Input Apply Instant Vectorization Split-Apply-Combine The plyr Package Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 125 125 127 132 135 136 136 138 141 141 141 10 Packages 143 Chapter Goals Loading Packages The Search Path Libraries and Installed Packages Installing Packages Maintaining Packages Summary Test Your Knowledge: Quiz Test Your Knowledge: Exercises 143 144 146 146 148 150 150 151 151 11 Dates and Times 153 viii | Table of Contents #' @author Richie Cotton #' @docType package #' @name squares #' @aliases squares squares-package #' @keywords package NULL The data documentation goes in the same file, or in R/squares-data.R: #' Sum of squares dataset #' #' The sum of squares of natural numbers #' \itemize{ #' \item{x}{Natural numbers.} #' \item{y}{The sum of squares from to \code{x}.} #' } #' #' @docType data #' @keywords datasets #' @name squares_data #' @usage data(squares_data) #' @format A data frame with 10 rows and variables NULL Exercise 17-3 The code is easy; the hard part is fixing any problems: check("squares") build("squares") 364 | Appendix D: Solutions to Exercises Bibliography Jason R Briggs Python for Kids: A Playful Introduction to Programming 2012 William Pollock ISBN-13 978-1-59327-407-8 Garrett Grolemund Data Analysis with R 2013 O’Reilly ISBN-13 978-1-4493-5901-0 Andrie de Vries and Joris Meys R For Dummies 2012 John Wiley & Sons ISBN-13 978-1-1199-6284-7 Michael Fitzgerald Introducing Regular Expressions 2012 O’Reilly ISBN-13 978-1-4493-9268-0 Paul Murrell R Graphics, Second Edition 2011 Chapman and Hall/CRC ISBN-13 978-1-4398-3176-2 Hadley Wickham ggplot2: Elegant Graphics for Data Analysis 2010 Springer ISBN-13 978-0-3879-8140-6 Deepayan Sarkar Lattice: Multivariate Data Visualization with R 2008 Springer ISBN-13 978-0-3877-5968-5 Edward R Tufte Envisioning Information 1990 Graphics Press USA ISBN-13 978-0-9613-9211-6 Michael J Crawley The R Book 2013 John Wiley & Sons ISBN-13 978-0-4709-7392-9 10 Andy Field, Jeremy Miles, and Zoe Field Discovering Statistics Using R 2012 SAGE Publications ISBN-13 978-1-4462-0046-9 11 Max Kuhn Applied 978-1-4614-6848-6 Predictive Modeling 2013 Springer ISBN-13 12 John Fox and Sanford Weisberg An R Companion to Applied Regression 2011 SAGE Publications ISBN-13 978-1-4129-7514-8 365 13 José Pinheiro and Douglas Bates Mixed-Effects Models in S and S-PLUS 2009 Springer ISBN-13 978-1-4419-0317-4 14 Graham Williams Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery 2011 Springer ISBN-13 978-1-4419-9889-7 15 Thomas Lumley “Standard nonstandard evaluation rules” 2003 http://developer.rproject.org/nonstandard-eval.pdf 16 Dirk Eddelbuettel Seamless R and C++ Integration with Rcpp 2013 Springer ISBN-13 978-1-4614-6867-7 17 Yihui Xie Dynamic Documents with R and knitr 2013 Chapman and Hall/CRC ISBN-13 978-1-4822-0353-0 18 Michael Lawrence and John Verzani Programming Graphical User Interfaces in R 2012 Chapman and Hall/CRC ISBN-13 978-1-4398-5682-6 19 Q Ethan McCallum and Stephen Weston Parallel R 2012 O’Reilly ISBN-13 978-1-4493-0992-3 366 | Bibliography Index Symbols ! operator, 21 # symbol, $ operator, 62 % operator, 96 %% operator, 15 %*% operator, 53 %/% operator, 15 %in% operator, 82 %o% operator, 53 & operator, 21 + operator, 13, 300 : operator, 13, 13, 39