1. Trang chủ
  2. » Công Nghệ Thông Tin

OReilly r packages

201 630 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 201
Dung lượng 5,9 MB

Nội dung

R Packages Ideal for developers, data scientists, and programmers with various backgrounds, this book starts with the basics and shows you how to improve your package writing over time You’ll learn to focus on what you want your package to do, rather than think about package structure ■■ Learn about the most useful components of an R package, including vignettes and unit tests ■■ Take advantage of devtools to automate anything you can ■■ Get tips on good style, such as organizing functions into files ■■ Streamline your development process with devtools ■■ Discover the best way to submit your package to the Comprehensive R Archive Network (CRAN) ■■ Learn from a well-respected member of the R community who created 30 R packages, including ggplot2, dplyr, and tidyr book is a practical, “ This hands-on guide for building high-quality software in R Any R programmer looking to 'reach the next level' would well to give this a read ” R Packages Turn your R code into packages that others can easily download and use This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham’s package development philosophy In the process, you’ll work with devtools, roxygen, and testthat, a set of R packages that automates common development tasks Devtools encapsulates best practices that Hadley has learned from years of working with this programming language —Wes McKinney creator of pandas R Packages Hadley Wickham is Chief Scientist at RStudio He’s a well-respected member of the R community who has written and contributed to over 30 R packages Hadley won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization US $39.99 Twitter: @oreillymedia facebook.com/oreilly Wickham DATA /DATA SCIENCE ORGANIZE, TEST, DOCUMENT, AND SHARE YOUR CODE CAN $45.99 ISBN: 978-1-491-91059-7 Hadley Wickham R Packages Ideal for developers, data scientists, and programmers with various backgrounds, this book starts with the basics and shows you how to improve your package writing over time You’ll learn to focus on what you want your package to do, rather than think about package structure ■■ Learn about the most useful components of an R package, including vignettes and unit tests ■■ Take advantage of devtools to automate anything you can ■■ Get tips on good style, such as organizing functions into files ■■ Streamline your development process with devtools ■■ Discover the best way to submit your package to the Comprehensive R Archive Network (CRAN) ■■ Learn from a well-respected member of the R community who created 30 R packages, including ggplot2, dplyr, and tidyr book is a practical, “ This hands-on guide for building high-quality software in R Any R programmer looking to 'reach the next level' would well to give this a read ” R Packages Turn your R code into packages that others can easily download and use This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham’s package development philosophy In the process, you’ll work with devtools, roxygen, and testthat, a set of R packages that automates common development tasks Devtools encapsulates best practices that Hadley has learned from years of working with this programming language —Wes McKinney creator of pandas R Packages Hadley Wickham is Chief Scientist at RStudio He’s a well-respected member of the R community who has written and contributed to over 30 R packages Hadley won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization US $39.99 Twitter: @oreillymedia facebook.com/oreilly Wickham DATA /DATA SCIENCE ORGANIZE, TEST, DOCUMENT, AND SHARE YOUR CODE CAN $45.99 ISBN: 978-1-491-91059-7 Hadley Wickham R Packages Hadley Wickham R Packages by Hadley Wickham Copyright © 2015 Hadley Wickham All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com) For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com Editors: Ann Spencer and Marie Beaugureau Production Editor: Kara Ebrahim Copyeditor: Jasmine Kwityn Proofreader: Kim Cofer April 2015: Indexer: Wendy Catalano Interior Designer: David Futato Cover Designer: Ellie Volckhausen Illustrator: Rebecca Demarest First Edition Revision History for the First Edition 2015-03-20: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491910597 for release details The O’Reilly logo is a registered trademark of O’Reilly Media, Inc R Packages, the cover image of a kaka, and related trade dress are trademarks of O’Reilly Media, Inc While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-91059-7 [LSI] Table of Contents Preface ix Part I Getting Started Introduction Philosophy Getting Started Conventions Colophon 4 Package Structure Naming Your Package Requirements for a Name Strategies for Creating a Name Creating a Package RStudio Projects What Is an RStudio Project File? What Is a Package? Source Packages Bundled Packages Binary Packages Installed Packages In-Memory Packages What Is a Library? 5 11 11 12 13 14 15 16 iii Part II Package Components R Code 21 R Code Workflow Organizing Your Functions Code Style Object Names Spacing Curly Braces Line Length Indentation Assignment Commenting Guidelines Top-Level Code Loading Code The R Landscape When You Do Need Side Effects S4 Classes, Generics, and Methods CRAN Notes 21 21 22 23 24 25 25 25 26 26 27 27 28 29 31 31 Package Metadata 33 Dependencies: What Does Your Package Need? Versioning Other Dependencies Title and Description: What Does Your Package Do? Author: Who Are You? On CRAN License: Who Can Use Your Package? On CRAN Version Other Components 34 36 36 37 38 40 40 41 41 42 Object Documentation 43 The Documentation Workflow Alternative Documentation Workflow Roxygen Comments Documenting Functions Documenting Datasets Documenting Packages Documenting Classes, Generics, and Methods S3 iv | Table of Contents 44 46 47 49 51 51 51 51 S4 RC Special Characters Do Repeat Yourself Inheriting Parameters from Other Functions Documenting Multiple Functions in the Same File Text Formatting Reference Sheet Character Formatting Links Lists Mathematics Tables 52 53 54 54 55 55 56 57 57 57 58 58 Vignettes: Long-Form Documentation 59 Vignette Workflow Metadata Markdown Sections Lists Inline Formatting Tables Code Knitr Options Development Cycle Advice for Writing Vignettes Organization CRAN Notes Where to Go Next 60 61 62 63 63 64 64 64 65 66 67 68 68 69 69 Testing 71 Test Workflow Test Structure Expectations Writing Tests What to Test Skipping a Test Building Your Own Testing Tools Test Files CRAN Notes 72 73 74 76 77 77 78 80 80 Table of Contents | v Namespace 81 Motivation Search Path The NAMESPACE Workflow Exports S3 S4 RC Data Imports R Functions S3 S4 Compiled Functions 81 82 84 86 86 87 88 88 88 88 89 89 90 90 External Data 91 Exported Data Documenting Datasets Internal Data Raw Data Other Data CRAN Notes 91 93 93 94 94 94 10 Compiled Code 97 C++ Workflow Documentation Exporting C++ Code Importing C++ Code Best Practices C Getting Started with Call() Getting Started with C() Workflow Exporting C Code Importing C Code Best Practices Debugging Compiled Code Makefiles Other Languages Licensing vi | Table of Contents 97 98 99 100 100 100 101 102 103 104 104 106 106 107 109 109 110 Development Workflow CRAN Issues 110 110 11 Installed Files 113 Package Citation Other Languages 114 115 12 Other Components 117 Demos Part III 117 Best Practices 13 Git and GitHub 121 RStudio, Git, and GitHub Initial Setup Creating a Local Git Repository Seeing What’s Changed Recording Changes Best Practices for Commits Ignoring Files Undoing Mistakes Synchronizing with GitHub Benefits of Using GitHub Working with Others Issues Branches Making a Pull Request Submitting a Pull Request to Another Repo Reviewing and Accepting Pull Requests Learning More 122 123 124 126 128 130 131 132 134 135 137 138 139 140 142 144 145 14 Automated Checking 147 Workflow Checks Check Metadata Package Structure Description Namespace R Code Data Documentation 147 148 148 149 151 152 153 155 156 Table of Contents | vii Demos Compiled Code Tests Vignettes Checking After Every Commit with Travis Basic Config Other Uses 158 158 158 159 160 160 161 15 Releasing a Package 163 Version Number Backward Compatibility The Submission Process Test Environments Check Results Reverse Dependencies CRAN Policies Important Files README.md README.Rmd NEWS.md Release On Failure Binary Builds Prepare for Next Version Publicizing Your Package Congratulations! 163 164 166 168 169 169 170 171 171 171 172 173 174 175 175 176 176 Index 177 viii | Table of Contents Check Results You’ve already learned how to use R CMD check and why it’s important in Chapter 14 Compared to running R CMD check locally, there are a few important differences when running it for a CRAN submission: • You must fix all ERRORs and WARNINGs A package that contains any errors or warnings will not be accepted by CRAN • Eliminate as many NOTEs as possible Each note requires human oversight, which is a precious commodity If there are notes that you not believe are important, it is almost always easier to fix them (even if the fix is a bit of a hack) than to persuade CRAN that they’re OK See “Checks” on page 148 for details on how to fix individual problems If you have no NOTEs, it is less likely that your package will be flagged for addi‐ tional human checks These are time consuming for both you and CRAN, so are best avoided if possible • If you can’t eliminate a NOTE, document it in cran-comments.md, describing why you think it is spurious Your comments should be easy to scan, and easy to match up with R CMD check Provide the CRAN maintainers with everything they need in one place, even if it means repeating yourself There will always be one NOTE when you first submit your package This reminds CRAN that this is a new submission and that they’ll need to some extra checks You can’t eliminate this, so just men‐ tion in cran-comments.md that this is your first submission Reverse Dependencies Finally, if you’re releasing a new version of an existing package, it’s your responsibility to check that downstream dependencies (i.e., all packages that list your package in the Depends, Imports, Suggests, or LinkingTo fields) continue to work To help you this, devtools provides devtools::revdep_check() This: Sets up a temporary library so it doesn’t clobber any existing packages you have installed Installs all of the dependencies of the downstream dependencies Runs R CMD check on each package Summarizes the results in a single file Run use_revdep() to set up your package with a useful template The Submission Process | 169 If any packages fail R CMD check, you should give package authors at least two weeks to fix the problem before you submit your package to CRAN (you can easily get all maintainer email addresses with revdep_maintainers()) After the two weeks is up, rerun the checks, and list any remaining failures in cran-comments.md Each package should be accompanied by a brief explanation that either tells CRAN that it’s a false positive in R CMD check (e.g., you couldn’t install a dependency locally) or that it’s a legitimate change in the API (which the maintainer hasn’t fixed yet) Inform CRAN of your release process: “I advised all downstream packages maintain‐ ers of these problems two weeks ago.” Here’s an example from a recent release of dplyr: Important reverse dependency check notes (full details at https://github.com/wch/checkresults/tree/master/dplyr/r-release); * COPASutils, freqweights, qdap, simPH: fail for various reasons All package authors were informed of the upcoming release and shown R CMD check issues over two weeks ago * ggvis: You'll be receiving a submission that fixes these issues very shortly from Winston * repra, rPref: uses a deprecated function CRAN Policies As well as the automated checks provided by R CMD check, there are a number of CRAN policies that must be checked manually The CRAN maintainers will typically look at this very closely on a package’s first submission I’ve summarized the most common problems in the following list: • It’s vital that the maintainer’s email address is stable, because this is the only con‐ tact information CRAN has for you, and if there are problems and they can’t get in touch with you, they will remove your package from CRAN So make sure it’s something that’s likely to be around for a while, and that it’s not heavily filtered • You must have clearly identified the copyright holders in DESCRIPTION: if you have included external source code, you must ensure that the license is compati‐ ble See “License: Who Can Use Your Package?” on page 40 and “Licensing” on page 110 for more details • You must “make all reasonable efforts” to get your package working across multi‐ ple platforms Packages that don’t work on at least two will not normally be considered 170 | Chapter 15: Releasing a Package • Do not make external changes without explicit user permission Don’t write to the filesystem, change options, install packages, quit R, send information over the Internet, open external software, or make other similar changes • Do not submit updates too frequently The policy suggests a new version once every 1–2 months at most I recommend following the CRAN Policy Watch Twitter account, which tweets when‐ ever there’s a policy change You can also look at the GitHub repository that powers it Important Files You now have a package that’s ready to submit to CRAN But before you do, there are two important files that you should update: README.md, which describes what the package does, and NEWS.md, which describes what’s changed since the previous ver‐ sion I recommend using Markdown for these files, because it’s useful for them to be readable as both plain text (e.g., in emails) and HTML (e.g., on GitHub or in blog posts) I recommend using GitHub-flavored Markdown for these files README.md The goal of the README.md is to answer the following questions about your package: • Why should I use it? • How I use it? • How I get it? On GitHub, the README.md will be rendered as HTML and displayed on the repos‐ itory home page I normally structure my README as follows: A paragraph that describes the high-level purpose of the package An example that shows how to use the package to solve a simple problem Installation instructions, giving code that can be copied and pasted into R An overview that describes the main components of the package For more com‐ plex packages, this will point to vignettes for more details README.Rmd If you include an example in your README (a good idea!) you may want to generate it with R Markdown The easiest way to get started is to use devtools:: Important Files | 171 use_readme_rmd() This creates a template README.Rmd and adds it to Rbuil‐ dignore The template looks like: output: md_document: variant: markdown_github README.md is generated from README.Rmd Please edit that file > ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) ``` This: • Outputs Github-flavored Markdown • Includes a comment in README.md to remind you to edit README.Rmd, not README.md • Sets up my recommended knitr options, including saving an image to READMEchunkname.png (which is automatically Rbuildignored.) You’ll need to remember to re-knit README.Rmd each time you modify it If you use Git, use_readme_rmd() automatically adds the following “pre-commit” hook: #!/bin/bash if [[ README.Rmd -nt README.md ]]; then echo "README.md is out of date; please re-knit README.Rmd" exit fi This prevents git commit from succeeding unless README.md is more recent than README.Rmd If you get a false positive, you can ignore the check with git commit no-verify Note that Git commit hooks are not stored in the repository, so every time you clone the repo, you’ll need to run devtools::use_readme_rmd() to set it up again NEWS.md The README.md is aimed at new users The NEWS.md is aimed at existing users: it should list all the API changes in each release There are a number of formats you can use for package news, but I recommend NEWS.md It’s not supported by CRAN (so 172 | Chapter 15: Releasing a Package you’ll need to run devtools::use_build_ignore("NEWS.md")), but it’s well sup‐ ported by GitHub and is easy to repurpose for other formats Organize your NEWS.md as follows: • Use a top-level heading for each version (e.g., # mypackage 1.0) The most recent version should go at the top • Each change should be included in a bulleted list If you have a lot of changes, you might want to break them up using subheadings (e.g., ## Major changes, ## Bug fixes, etc.) I usually stick with a simple list until just before releasing the package, when I’ll reorganize into sections, if needed It’s hard to know in advance exactly what sections you’ll need • If an item is related to an issue in GitHub, include the issue number in parenthe‐ ses (e.g., (#10)) If an item is related to a pull request, include the pull request number and the author (e.g., (#101, @hadley)) Doing this makes it easy to nav‐ igate to the relevant issues on GitHub The main challenge with NEWS.md is getting into the habit of noting a change as you make a change Release You’re now ready to submit your package to CRAN The easiest way to this is to run devtools::release() This: • Builds the package and runs R CMD check one last time • Asks you a number of yes/no questions to verify that you followed the most com‐ mon best practices • Allows you to add your own questions to the check process by including an unexported release_questions() function in your package This should return a character vector of questions to ask For example, httr has: release_questions

Ngày đăng: 18/04/2017, 10:31