You may need to install some software for this sequence of 3 tutorials.
install.packages(c('knitr','shiny','rmarkdown'), dep=T)
These are sprinkled throughout. Only read them if everything is going really well…
For anyone using this document outside of a lecture, please note that this lesson is designed to distill the key features of a package for publishing reproducible research. These are my opinions on the key features or package building and which details are worth the time they take. This approach cuts details that can take beginners a lot of time to wade through, keeping them away from the science that motivates their efforts. This approach is not designed for generic packages that will be broadly used. There are many better resources for that.
The first practical advantage to using a package is that it’s easy to re-load your code. You can either run devtools::load_all(), or in RStudio press Ctrl/Cmd + Shift + L, which also saves all open files, saving you a keystroke.
Hadley says:: The first practical advantage to using a package is that it’s easy to re-load your code. You can either run devtools::load_all(), or in RStudio press Ctrl/Cmd + Shift + L, which also saves all open files, saving you a keystroke.
We’ll quickly generate a working example, and work backwards to understand the components.
These are sprinkled throughout. Only read them if everything is going really well…
Let’s start by making your first package. Start a new project in RStudio (File…New Project). A box pops, where you can choose ‘New Directory’, followed by ‘R Package’ to get to this screen:
Think of a name for your package. I’m choosing Merow2017Nature
because that might be the name of a paper I’m sharing code for. Note that you should check the box ‘Create a git repository’. We’ll come back to that later. Selecting ‘Create project’ will build a template with all the essential files for a package. Done. You made a package.
Packages have four required components:
Then there are some optional files too. The .Rproj one stores info for developing with RStudio as with any other projects you have, while .gitignore and .Rbuildignore will come up later, along with some others we haven’t made yet for sharing data, vignettes, etc.
Now we need to add some functions. We’ll get into a bunch of details on writing functions later, but here are some easy ones so we can focus on the package making process first. Put them in a file (or their own files) in the R/ folder of your package. You can organize them there and name files however you like. I like to use many files with veryShortandConciseNamesWrittenInCamelCase.
# thanks to software carpentry for the nice demo functions!
fahrenheit_to_celsius <- function(temp_F) {
# Converts Fahrenheit to Celsius
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
celsius_to_kelvin <- function(temp_C) {
# Converts Celsius to Kelvin
temp_K <- temp_C + 273.15
return(temp_K)
}
fahrenheit_to_kelvin <- function(temp_F) {
# Converts Fahrenheit to Kelvin using fahrenheit_to_celsius() and celsius_to_kelvin()
temp_C <- fahrenheit_to_celsius(temp_F)
temp_K <- celsius_to_kelvin(temp_C)
return(temp_K)
}
roxygen2
)Documentation is a required element of an R package. roxygen2
is where its at; I haven’t noticed anyone use anything else in years. It’s convenient because your code and the documentation live together in the same file and the NAMESPACE file (describing your exported functions and other people’s imported functions) is automatically generated.
roxygen2
reads lines that begin with #’ as comments to create the documentation for your package. Descriptive tags are preceded with the @ symbol. For example, @param has information about the input parameters for the function. Here’s a minimal example
#' @title Converts Fahrenheit to Celsius
#'
#' @description This function converts input temperatures in Fahrenheit to Celsius.
#' @param temp_F The temperature in Fahrenheit.
#' @return The temperature in Celsius.
#' @export
#' @examples
#' fahrenheit_to_kelvin(32)
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
Ideally, you should write the type of documentation you’d like to read. But if you’ve ever used stackoverflow, it’s clear that many people don’t really read documentation. So I think the most critical part to do really well is choosing informative, commented examples.
Now that you’ve written the documentation, you need to ‘build’ the documentation files (the .Rd files that live in the man/ folder). You might need to configure your RStudio session to tell roxygen2
to generate them by selecting these settings :
Now you can build the man/ files:
devtools::document()
Go have a look at one; you can preview how it would appear to a user with the Preview button:
A list of all available tags is below. Don’t be skerd; you’re likely to need 6-7 more of them, like @note
, @references
, @author
. Maybe @details
, @family
, @seealso
.
names(roxygen2:::default_tags())
## [1] "evalNamespace" "export" "exportClass"
## [4] "exportMethod" "exportPattern" "import"
## [7] "importClassesFrom" "importFrom" "importMethodsFrom"
## [10] "rawNamespace" "S3method" "useDynLib"
## [13] "aliases" "author" "backref"
## [16] "concept" "describeIn" "description"
## [19] "details" "docType" "encoding"
## [22] "evalRd" "example" "examples"
## [25] "family" "field" "format"
## [28] "inherit" "inheritParams" "inheritDotParams"
## [31] "inheritSection" "keywords" "method"
## [34] "name" "md" "noMd"
## [37] "noRd" "note" "param"
## [40] "rdname" "rawRd" "references"
## [43] "return" "section" "seealso"
## [46] "slot" "source" "template"
## [49] "templateVar" "title" "usage"
The NAMESPACE file is one for the 4 essential files in a package, and is automatically generated when you use document()
. It lists all the functions that are exported, which means that other users have easy access to them when they load your package, as well as functions you imported from other packages using @importFrom
in your roxygen2
documentation. Here’s an example:
Go have a look at the one generated by your demo package.
The DESCRIPTION file contains all the metadata needed for your package. Seems pretty straightforward. Lots of obvious info to provide. But you can spend a lot of time debugging cryptic errors if you don’t get the syntax perfect perfect.
Here’s an example of one from our maskRangeR
package.
Here are some very interesting details that are important to save you time:
as.person()
for the Author field. Previously there were more flexible ways to specify this, but its a standard now.
You probably shouldn’t have as many dependencies as I do here. It can make it tough to maintain your package if others change. But it works often. A better solution is just to import the specific function you need from another package, if you only need 1 or 2 functions from that package.
I read somewhere that the Description field should only have 80 characters per line. I’ve succeeded getting packages on CRAN without that formatting a while ago, but is it worth the risk?
These three steps are lumped together here because you’re often jumping back and forth between them as you’re honing in on a working package.
Note that you can use the point and click interface on RStudio’s Build menu or check()
, build()
, and install()
for these steps if you’re averse to clicks.
To ensure you’ve followed the right protocols when designing your package, R offers checking tools. There are options, but it’s safest just to check with CRAN’s standards as they’re rigorous. To check for problems, either click the Check button shown above or type check()
in the console. A very long litany of obscure details will likely follow. You’re looking for NOTES, WARNINGS and ERRORS. Don’t worry about everything else. The goal is to address these issues such that none remain when you run check()
the final time. check()
the package we’ve been building to try for yourself. We’ll do some more elaborate checks in a demo below.
Once you’ve passed all the checks, you can click Install and Restart or type install()
in the console. Now your package is loaded in R and ready to use.
If you’d like to share a zipped up version of the package, e.g., as you’d submit to CRAN, choose Build Source Package from the drop down menu.
# Demo
Here’s the plan. I want you to see what a fully functional R package looks like, but I don’t want it to include a bunch of stats that obscure the challenge of understanding the package structure. So we’ll fix a version of my rangeModelMetadata
package that I’ve intentionally introduced errors to. This package just works with text to create an list
or CSV of text that represents metadata for species distribution models.
intentionallyBrokenRangeModelMetada
devtools::document()
just to be sure all the manuals are built.check()
either from the command line or through RStudio.check()
periodically to see if you’ve succeeded. Hint: I’ve only introduced 1 error per function.Vignettes are super important. If someone is going to read one thing you write in your package, it’s likely the vignette. Its easiest to write vignettes with R markdown. R markdown is also a way to share project reports and make websites like this, so learning it enables more options. Here’s a full lesson I made; we’ll just skim it here. Let’s make a vignette for the demo package you were building (not the one you just debugged, so you’ll have to switch projects in RStudio.)
Here are some super useful guides; really they contain almost all you need to know, and I think you can just experiment after skimming them.
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
The reference guide is a little more comprehensive
In Rstudio: File -> New File -> RMarkdown -> Document -> HTML
This will open a document that looks like this:
You’ll need to make a folder called /vignettes/ in your root directory, and save it there. This is a .Rmd file.
All R code to be run must be in a code chunk like this:
## #```{r,eval=F}
## # CODE HERE
## #```
Add a new code chunk at the bottom of this template file.
celsius_to_kelvin(0)
## [1] 273.15
fahrenheit_to_kelvin(32)
## [1] 273.15
To see how this will look when built, press CMD+Shift+K (to ‘knit’ the document using the knitr
package).
Let’s add a plot, which will show up in the doc.
far=seq(0,220,length=100)
kel=fahrenheit_to_kelvin(far)
plot(far,kel)
To create the document in html, just hit CMD+Shift+K to knit.
Next, try exploring some more features of Rmds, outlined in the cheatsheets above. I find that some of the most useful additional tools are:
Supplying arguments to a chunk of code (section 5 of the cheatsheet), to avoid evaluation or hide results (eval=FALSE
or results='hide'
). E.g., this can be a good way to load past results for code that’s slow.
Adding images, e.g., to show figures from a paper.
Adding latex equations by surrounding code with $$. E.g., $$\alpha$$ gives
α
Changing options in the YAML header to make the doc fancier. Try replacing
output: html_document
with
output:
rmarkdown::html_vignette:
toc: true
number_sections: true
toc_depth: 3
Although you can build your vignettes with the Knit button in RStudio to test them out, you need to formally build them for the package with devtools::build_vignettes()
. Try it out.
Note that the doc/ folder is created, and this is where your vignettes are stored when the package is built. This is kind of confusing because it seems redundant with the vignettes folder. Don’t edit these, always be sure to edit the files in the /vignettes/ folder.
Particularly when sharing code for a single paper, it can be helpful to include a variety of auxiliary files used for this or that. Maybe they’re scripts you use with the package, or data you’re too lazy to write documentation for, or notes to future you. For these, create the inst/ folder in your root directory, and the extdata/ folder in that. Put anything you like there.
It can be helpful to access those files when the package is installed with something like:
## ddFile=system.file("extdata/dataDictionary.csv",package='rangeModelMetadata')
## system(paste0('open ', ddFile, ' -a "Microsoft Excel"'))
I’m just going to demonstrate the super simple clicky version of using github with RStudio. If you want more features or to try the command line, try this easy tutorial. There are many others too; no need to reproduce here. (There’s a reproducibility joke here somewhere that I’m missing ….)
Follow these instructions to create a new github repo and add the package you’ve been working with to it. Then follow these steps to save your changes on the github website.
That’s it; your code is tracked and on github. Of course there are many more powerful ways to use git to collaborate with multiple code authors, to explore ideas and revert back to older ones if they fail, etc. But all these few steps are all you really need to share packages online. More detailed instructions are here.
Now, others can install your package using install_github()
.
It is useful to avoid sending all your files to github; some may be used for testing, are too large, are temp files or helper files that no one cares about.
The .gitignore file in your root directory stores rules for what to ignore.Here’s what I always include in my .gitignore, borrowed from various smarter people. Try putting this in your package and see if it breaks.
## # Meta
## # doc
## # .Rproj.user
## # .Rhistory
## # .RData
## # .Ruserdata
## #
## # # History files
## # .Rapp.history
## # # Session Data files
## # # Example code in package build process
## # *-Ex.R
## # # Output files from R CMD build
## # /*.tar.gz
## # # Output files from R CMD check
## # /*.Rcheck/
## # # RStudio files
## # .Rproj.user/
## # # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
## # .httr-oauth
## # # knitr and R markdown default cache directories
## # /*_cache/
## # /cache/
## # # Temporary files created by R markdown
## # *.utf8.md
## # *.knit.md
## # .DS_Store
## # .Rbuildignore
##
Importantly * is a wildcard symbol, so something like *-Ex.R means ignore all the files that end in -Ex.R. Or you can list every single file manually if you’re into that sort of thing.
Note that you might want to stop ‘tracking’ a file with git. Here are instructions.
git is super powerful for teams of people working on code concurrently to avoid breaking one another’s work. But it takes some learning. git is awesome for tracking your edits and sharing on github with the simple approach shown here. I don’t do much fancy branching, merging or pull requesting, because I spend more time fixing mistakes I thought I understood. This is clearly because I’m just a gitiot and not git’s fault. But I’d recommend saving your time learning git more fully until you really really need it.
Hidden functions don’t need documentation If you need helper functions that users won’t need access to you can make them hidden by beginning the function name with a period. Hidden functions don’t require documentation, so this can also be a useful way to avoid check WARNINGS when code is in development. Try adding this function to your package and run check
to be sure you don’t get any errors
.fahrenheit_to_celsius2 <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
Passing arguments Sometimes other functions are embedded in your function, and you’d like to pass arguments to them. For example, below, I add an option to make a plot and the plot
function has a wide range of options that you might want to customize. You can add ...
to your function’s arguments, and ...
to the function you’re passing arguments to to achieve this:
fahrenheit_to_celsius3 <- function(temp_F,doPlot=F,...) {
temp_C <- (temp_F - 32) * 5 / 9
if(doPlot) plot(temp_F,temp_C,...)
return(temp_C)
}
far=seq(0,100,by=1)
fahrenheit_to_celsius3(far,doPlot=T,col='red',pch=19,cex=.7)
browser()
and debug()
provide complementary ways to explore errors and see exactly what’s going on inside your function, in the environment its working with. Each allows you to step through the function line by line to explore problems. browser()
is inserted inside the function on the line you want to begin exploring at. Try running this:
fahrenheit_to_kelvin <- function(temp_F) {
# Converts Fahrenheit to Kelvin using fahrenheit_to_celsius() and celsius_to_kelvin()
temp_C <- fahrenheit_to_celsius(temp_F)
browser()
temp_K <- celsius_to_kelvin(temp_C)
return(temp_K)
}
fahrenheit_to_kelvin(17)
## Called from: fahrenheit_to_kelvin(17)
## debug at <text>#5: temp_K <- celsius_to_kelvin(temp_C)
## debug at <text>#6: return(temp_K)
## [1] 264.8167
There are three key commands to advance through the lines of code when in browser()
or debug()
: - n
evaluate this line and move to next - c
continue running lines till the function ends or breaks - Q
get out of debug mode. Within debug mode, you can type any of the usual commands you’d use in R to see what objects in the environment look like, or to run other tests.
Alternatively to browser()
, you can call debug(yourFunctionName)
to step through every line, as though you had put your browser()
command on line 1 of your function. undebug()
gets you out of debug mode.
The best packages include a variety of checks at the beginning of functions to determine whether you might’ve input something incorrectly, and giving useful advice if you have. No one polices this; the level of detail you provide is something that you have to be able to sleep with at night. Usually, these tests involve printing something to the console. You might be used to using print
or cat
to send text to the console. For packages you need to use message
or warning
for the same tasks so that users can have control over the output, e.g., with supressMessages
.
It’s also useful to include stop
to force the function to error out. You can consider wrapping breakable expressions in try
to give some more options for handling errors. Try calling function in your demo with a known error (e.g., make the argument a string) and write a try statement to catch it an the write an error message.
Here’s an example from one my packages. Put this function in your R directory.
## .onAttach <- function(libname,pkgname) {
## packageStartupMessage('Type, vignette("rmm_directory") for an overview of functions')
## }
Almost always, the answer you need is briefly and well explained in Hadley Wickham’s online book here.
I just found this, so don’t know it well, but ROpenSci does a ton of smart stuff.
Obviously there’s the CRAN book on R extensions here. It tends to be long and winding and hard to search. Don’t bother with it unless you’re having trouble sleeping.
plot
and print
work with objects from your code.
system.file()
.