Advanced R Data Programming and the Cloud — Matt Wiley Joshua F Wiley Advanced R Data Programming and the Cloud Matt Wiley Joshua F Wiley Advanced R: Data Programming and the Cloud Matt Wiley Elkhart Group Ltd & Victoria College Columbia City, Indiana USA Joshua F Wiley Elkhart Group Ltd & Victoria College Columbia City, Indiana USA ISBN-13 (pbk): 978-1-4842-2076-4 DOI 10.1007/978-1-4842-2077-1 ISBN-13 (electronic): 978-1-4842-2077-1 Library of Congress Control Number: 2016959581 Copyright © 2016 by Matt Wiley and Joshua F Wiley This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Trademarked names, logos, and images may appear in this book Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image, we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Managing Director: Welmoed Spahr Lead Editor: Steve Anglin Technical Reviewer: Andrew Moskowitz Editorial Board: Steve Anglin, Pramila Balan, Laura Berendson, Aaron Black, Louise Corrigan, Jonathan Gennick, Robert Hutchinson, Celestin Suresh John, Nikhil Karkal, James Markham, Susan McDermott, Matthew Moodie, Natalie Pao, Gwenan Spearing Coordinating Editor: Mark Powers Copy Editor: Sharon Wilkey Compositor: SPi Global Indexer: SPi Global Artist: SPi Global Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013 Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc) SSBM Finance Inc is a Delaware corporation For information on translations, please e-mail rights@apress.com, or visit www.apress.com Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use eBook versions and licenses are also available for most titles For more information, reference our Special Bulk Sales–eBook Licensing web page at www.apress.com/bulk-sales Any source code or other supplementary materials referenced by the author in this text are available to readers at www.apress.com For detailed information about how to locate your book’s source code, go to www.apress.com/source-code/ Readers can also access source code at SpringerLink in the Supplementary Material section for each chapter Printed on acid-free paper To Family Contents at a Glance About the Authors xiii About the Technical Reviewer xv Acknowledgments xvii Introduction xix ■Chapter 1: Programming Basics ■Chapter 2: Programming Utilities 17 ■Chapter 3: Programming Automation 29 ■Chapter 4: Writing Functions 43 ■Chapter 5: Writing Classes and Methods 61 ■Chapter 6: Writing a Package 83 ■Chapter 7: Introduction to Data Management Using data.table 115 ■Chapter 8: Data Munging with data.table 141 ■Chapter 9: Other Tools for Data Management 159 ■Chapter 10: Reading Big Data(bases) 181 ■Chapter 11: Getting a Cloud 199 ■Chapter 12: Cloud Ubuntu for Windows Users 211 ■Chapter 13: Every Cloud has a Shiny Lining 225 ■Chapter 14: Shiny Dashboard Sampler 239 ■Chapter 15: Dynamic Reports and the Cloud 253 ■References 271 Index 275 v Contents About the Authors xiii About the Technical Reviewer xv Acknowledgments xvii Introduction xix ■Chapter 1: Programming Basics Advanced R Software Choices Reproducing Results Types of Objects Base Operators and Functions Mathematical Operators and Functions 11 References 15 ■Chapter 2: Programming Utilities 17 Help and Documentation 17 System and Files 18 Input 23 Output 25 References 27 ■Chapter 3: Programming Automation 29 Loops 29 Flow Control 32 *apply Family of Functions 35 Final Thoughts 42 vii ■ CONTENTS ■Chapter 4: Writing Functions 43 Components of a Function 43 Scoping 44 Functions for Functions 47 Debugging 52 Summary 59 ■Chapter 5: Writing Classes and Methods 61 S3 System 61 S3 Classes 61 S3 Methods 64 S4 System 71 S4 Classes 72 S4 Class Inheritance 76 S4 Methods 77 Summary 80 ■Chapter 6: Writing a Package 83 Before You Get Started 83 Version Control 84 R Package Basics 89 Starting a Package by Using DevTools 90 Adding R Code 92 Tests 93 Documentation Using roxygen2 98 Functions 99 Data 102 Classes 103 Methods 104 Building, Installing, and Distributing an R Package 107 Summary 112 viii ■ CONTENTS ■Chapter 7: Introduction to Data Management Using data.table 115 Introduction to data.table 115 Selecting and Subsetting Data 120 Using the First Formal 120 Using the Second Formal 122 Using the Second and Third Formals 123 Variable Renaming and Ordering 125 Computing on Data and Creating Variables 127 Merging and Reshaping Data 130 Merging Data 130 Reshaping Data 136 Summary 140 ■Chapter 8: Data Munging with data.table 141 Data Munging / Cleaning 142 Recoding Data 143 Recoding Numeric Values 148 Creating New Variables 150 Fuzzy Matching 152 Summary 157 ■Chapter 9: Other Tools for Data Management 159 Sorting 160 Selecting and Subsetting 162 Variable Renaming and Ordering 168 Computing on Data and Creating Variables 170 Merging and Reshaping Data 173 Summary 178 ix ■ CONTENTS ■Chapter 10: Reading Big Data(bases) 181 SQLite 182 Installing SQLite on Windows 182 SQLite and R 183 PostgreSQL 186 Installing PostgreSQL on Windows 186 PostgreSQL and R 187 MongoDB 190 Installing MongoDB on Windows 190 MongoDB and R 192 Summary 196 ■Chapter 11: Getting a Cloud 199 Disclaimers 199 Starting Amazon Web Services 200 Accessing Your Instance’s Command Line 205 Uploading Files to Your Instance 207 Final Thoughts 209 ■Chapter 12: Cloud Ubuntu for Windows Users 211 Common Commands 211 Superuser and Security 213 Installing and Using R 215 Installing and Using RStudio Server 218 Installing Microsoft R 222 Installing Java 224 Installing Shiny on Your Cloud 224 Final Thoughts 224 x ■ CONTENTS ■Chapter 13: Every Cloud has a Shiny Lining 225 The Basics of Shiny 225 Shiny in Motion 232 Uploading a User File into Shiny 234 Hosting Shiny in the Cloud 236 Final Thoughts 238 ■Chapter 14: Shiny Dashboard Sampler 239 A Dashboard’s Bones 239 Dashboard Header 241 Dashboard Sidebar 241 Dashboard Body 243 Dashboard in the Cloud 245 Complete Sampler Code 247 References 251 ■Chapter 15: Dynamic Reports and the Cloud 253 Needed Software 253 Local Machine 253 Cloud Instance 254 Dynamic Documents 254 Dynamic Documents and Shiny 258 server.R 258 ui.R 261 report.Rmd 263 Uploading to the Cloud 269 Summary 269 ■References 271 Index 275 xi CHAPTER 15 ■ DYNAMIC REPORTS AND THE CLOUD Figure 15-2 Shiny user interface, live on a locally hosted website report.Rmd This report is about student scores per question for a quiz or other test Our goal is to take the input shown in Figure 15-2 and convert it to actionable, summary information about question and test validity Before reading past this first show of all the code, take a moment to compare and contrast the code to a stand-alone markdown file There are no major differences In fact, the only way to know that this file rendered from a Shiny application is a single line of code that reads scores