# no libraries
R Setup Log
My notes on my personal R setup. I started my R journey in September 2020 after SIRACon 2020. Important: this historical setup is outdated and differs from my current approach, which is documented in rdev.
Installing R
I prefer installing R and RStudio using Homebrew. I use Homebrew Bundle to install all software on my systems, but installing R and RStudio is simple enough using the command line:
brew install --formula r && brew install --cask rstudio
R Variants
TL;DR: just use homebrew.
I don’t recommend installing the cask
variant of r because it creates problems with brew doctor
:
brew info --cask r
==> r: 4.4.0
https://www.r-project.org/
Not installed
From: https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/r/r.rb
==> Name
R
==> Description
Environment for statistical computing and graphics
==> Artifacts
R-4.4.0-arm64.pkg (Pkg)
==> Analytics
install: 675 (30 days), 2,024 (90 days), 8,454 (365 days)
I tried “installing all the things” with homebrew-r-srf but didn’t like how the uninstall doesn’t clean up, and frankly, I haven’t found a need for more than the capabilities included with homebrew R:
capabilities()
jpeg png tiff tcltk X11 aqua
TRUE TRUE TRUE TRUE TRUE TRUE
http/ftp sockets libxml fifo cledit iconv
TRUE TRUE FALSE TRUE FALSE TRUE
NLS Rprof profmem cairo ICU long.double
TRUE TRUE TRUE TRUE TRUE FALSE
libcurl
TRUE
Currently, this is everything except the X11 dependencies, which aren’t really needed:
?capabilities
Note to macOS users
Capabilities “jpeg”, “png” and “tiff” refer to the X11-based versions of these devices. If capabilities(“aqua”) is true, then these devices with type = “quartz” will be available, and out-of-the-box will be the default type. Thus for example the tiff device will be available if capabilities(“aqua”) || capabilities(“tiff”) if the defaults are unchanged.
As an active contributor to Homebrew, I strongly recommend using it for any/all software; their quality control is excellent.
R Workspace
Once R and RStudio have been installed, you’ll need a development environment and some packages.
Development Environment
Well, obviously, I use RStudio. There are other IDEs available, and it’s also possible to develop using the command line, but RStudio provides a pleasant, integrated experience for R development, and actively supports the R community. I don’t typically use the built-in Git client, instead using the GitHub Desktop client and git command-line tool.
brew install github
I also occasionally use vim
(mainly for shell scripts) and atom (mainly for vanilla .md
files using markdown-preview-enhanced).
Packages
Managing packages and environments are a challenge for most modern languages. Thankfully R doesn’t have the same level of challenge as python, or even ruby, managing packages available within a project is a best practice. I use renv for this purpose. (I originally discovered packrat but quickly discovered RStudio is replacing it with renv)
Here is my package maintenance approach:
Base R Packages
With no projects open, I periodically check for updates to the base R packages using the RStudio Packages “Update” function. These are the only packages I install to the base directory /usr/local/Cellar/r/4.0.3_2/lib/R/library
. Note: upgrading or reinstalling r through homebrew may ‘downgrade’ the base packages.
Development Packages
The only other tools I install outside of projects are for supporting development. Originally this was just renv, but now it includes a number of development tools needed to create projects. I use the R site library used by homebrew, currently /usr/local/lib/R/4.0/site-library
. I’ve developed a shell script to install development packages that you can find in the tools directory of this repository.
#!/bin/sh
# install development packages to site repository
# thanks to https://blog.sellorm.com/2017/10/21/quick-script-to-install-an-r-package-from-the-command-line/
# and https://github.com/Homebrew/homebrew-core/blob/master/Formula/r/r.rb
# designed to work with homebrew: `brew install --formula r && brew install --cask rstudio`
set -ex # halt script on error, echo on
PREFIX=`brew --prefix`
RVERSION=`${PREFIX}/bin/Rscript -e 'cat(as.character(getRversion()[1,1:2]))'`
SITELIB="${PREFIX}/lib/R/${RVERSION}/site-library"
DEVPKG='c("renv", "styler", "lintr", "miniUI", "devtools", "available")'
if [ ! -d "${SITELIB}" ]
then
echo "fatal error: ${SITELIB} does not exist - not using \`brew install --formula r\`?"
exit 1
fi
brew install libgit2 # required by devtools
echo "install.packages(${DEVPKG}, repos=\"https://cran.rstudio.com\", lib=\"${SITELIB}\")" | R --no-save
Project Packages
Any packages not needed to create projects I install within a project using renv - here are some good intros:
- A presentation at rstudio::conf 2020
- The RStudio Blog post
- An Introduction to renv
- renv on GitHub
::init() # to set up renv in the project
renv::install("rtraining") # to install a package
renv::status() # checks if renv.lock is in sync
renv::snapshot() # save libraries in renv.lock
renv::restore() # restore libraries from renv.lock - use when first using an existing project
renv::clean() # remove extra libraries
renv::update() # check for unpdated versions of installed libraries renv
If you’re installing (development) versions of packages from GitHub, you’ll be prompted to set up a personal access token using create_github_token()
and adding it to your .Renviron
as GITHUB_PAT=
using usethis::edit_r_environ()
. I had to install the development versions of both styler
and lintr
due to bugs not yet fixed in the released (CRAN) versions. I did also look at using pak but found it too buggy to use.
Note: I came across an odd quirk where renv will prompt you to upgrade base R projects, even if they’re not used. I resolved these by just upgrading the base packages with RStudio with no project open.
I’ve included some comments on useful packages in my R Training Log notebook.
Using R
So - what have I learned about using R? I won’t cover the actual Data Science part here, but have some recommended reading in my R Training Log.
Use git!
This wasn’t really something I learned with R, but use of version control for any kind of code/script is crucial. These days, I keep all scripts and configuration files in some flavor of public or private version control. Mostly that means GitHub, but also private git servers (via ssh) are easy to set up for work that you want to keep off of GitHub.
I don’t want to get too much into managing code via version control, but I favor trunk-based development with short-lived branches, small commits of working code, and rebase and merge for linear commit history. There’s a lot of good research on this topic from Google’s Dev Ops Research and Assessment team.
Using RStudio with GitHub
RStudio has good integration with GitHub. I’ve adopted the convention of “one RStudio Project (.Rproj) per repository” and storing the Rproj file in the repository. That seems to be the norm.
R Notebooks
R Notebooks are my preferred file format for data analysis, as they allow an easy mixing of text (using pandoc markdown) and R code chunks. This allows me to document what I’m doing as I go, both for reproducibility as well as recording my observations, thoughts and conclusions. It especially lends itself to iterative development:
- Write code
- Run code
- See results
- Update code
This method of development is not always appropriate, but fits well with exploratory analysis. Once I get to writing functions, I’m starting to adopt the formal structure of packages, testthat, and roxygen2.
Specifically, I use html_notebook
, which was recommended to me. RStudio handles this differently than html_document
:
Behavior | html_notebook |
html_document |
---|---|---|
Quick View | Preview Button | No Preview Button |
Saving Files | Automatically knits file on save | Manually knit file |
Rendering Files | Renders R output as it exists in the IDE on save | Renders R output by running all R code |
Default Options | Includes options for readability, hiding and downloading code, and paged tables | Higher-quality rendering of PNG graphics |
Essentially, R Notebooks are generally faster and more convenient when doing analysis, and html_document
R Markdown files offer higher-quality output. This site uses a custom framework to convert html_notebooks to html_documents for publishing (see the next section for details). They are also a convenient way to share analysis with peers - just email the .nb.html
file, which will include all of the output, as well as embedding the .Rmd
source code for easy editing. This also allows people who don’t have R/RStudio to see the results of an analysis.
There are some drawbacks to using R Notebooks:
- Because R Notebooks are render-on-save, you can inadvertently end up with missing or outdated R output from your notebook when saving, if you’ve made updates and haven’t re-run the entire document. My habit is to do the following at the end of a writing session, before committing to git, which ensures a “clean” notebook:
- Clear the Global Environment
- “Restart R and Clear Output”
- “Run All”
- Save
- Debug breakpoints don’t work in R Markdown documents. To fix this, convert R Markdown documents to R Scripts using purl for debugging.
- Not really a drawback, but…
Rcpp
andrprojroot
are erroneously listed in RStudio as required to create R Markdown, which can also cause problems withrenv
. This is a bug in RStudio, which will be fixed in version 1.4.944.
Publishing R Notebooks
Since R Notebooks are saved as html files, it’s possible to publish them on GitHub using GitHub pages. GitHub published a tutorial in 2018 on getting RStudio integrated with GitHub, and I started working on that. Quickly I discovered that while the tutorial was helpful, it wasn’t quite the setup I wanted; it published R Markdown through GitHub pages, but wouldn’t directly support the automatically generated html of R Notebooks. After more searching, I was able to get Notebooks working on GitHub, but I used the method described in rstudio/rmarkdown #1020 - checking in the .nb.html into git, and using GitHub Pages so that you can view the rendered HTML instead of just the HTML code.
Publishing with rmarkdown
After using README.md
and GitHub pages to publish notebooks, I found I wanted an easier way to publish and navigate across collections of notebooks. Publishing R Notebooks on GitHub pages works fine, but doesn’t offer an easy navigation structure, like pkgdown. I tried using pkgdown to display notebooks, but pkgdown only supports building vignettes
, which have a distinctly different look and feel than R Notebooks. The rmarkdown site generator, render_site, allows more flexible building of websites from R Markdown files with a simple navigation bar at the top, but doesn’t support html_notebook
files OOB.
To work around this, I created a simple framework for converting html_notebook
files to html_document
and building a _site.yml
from a list of notebooks stored in the non-standard notebooks/
directory, initially using a shell script, build_site
(stored at the root of this repo). It does the following:
- Creates a working directory,
.build-site
- Builds a
rmarkdown::render_site()
_site.yml
file that includes a menu with all notebooks in thenotebooks/
directory - Copies all
.Rmd
files to.build-site
changing their type fromhtml_notebook
tohtml_document
- Includes some configuration to make
html_documents
work more likehtml_notebook
- Calls
rmarkdown::render_site()
to render the site in thedocs/
directory
I typically rebuild a site with the following command, run from the top-level directory of the project:
rm -rf .build-site && sh build-site && open docs/index.html
This approach leverages the docs functionality of GitHub pages, like pkgdown.
Update on Publishing
build-site
has been replaced with build_analysis_site()
! At its core, it is functionally the same: builds a pkgdown
site, adding an “Analysis” menu with all notebooks in the (renamed) analysis/
directory, then converts and builds the notebooks using rmarkdown
, and moves them into the docs/
directory. Since it is now an R Script, it’s more portable and can be more easily bundled with packages. I will be migrating its functionality to rdev shortly, so that it’s usable across multiple analysis projects.
R Package Layout
Here is my package layout - the table shows the path, whether it’s part of the formal R package definition, and my notes on its use.
Path | R | Notes |
---|---|---|
.Rbuildignore | x |
Exclude files from package |
.Rprofile | x |
Used by renv and to attach development tools |
.github | GitHub templates and workflows | |
.gitignore | x |
I use R and macOS exclusions, and always exclude generated files outside of docs/ |
.lintr | lintr default linters with 100 columns: linters: with_defaults(line_length_linter(100)) |
|
DESCRIPTION | x |
use “Suggests” for development tools, per renv |
LICENSE | x |
Generated with use_mit_license() |
LICENSE.md | See above, used by pkgdown | |
NAMESPACE | x |
Generated with roxygen2 |
NEWS.md | Release notes, used by pkgdown | |
README.Rmd | x |
Generated with use_readme_md() |
README.md | x |
Generated with build_readme() |
R | x |
All project functions go here, with roxygen2 comments |
TODO.md | To-do list, inspired by renv’s historical TODO.md | |
analysis | Exploratory data analysis in R Notebooks and R Presentations | |
analysis/data | When appropriate, analysis data lives here | |
analysis/assets | External assets (images, other files) included in R Notebooks | |
analysis/rendered | Manually rendered html versions of analysis/ files to be included in docs/ , ie .Rpres files, not stored in git |
|
x |
I don’t use demos, as recommended by R Packages | |
docs | Used by pkgdown::build_site() and R Notebooks rendered as html_document using rmarkdown::render_site() via build_analysis_site() |
|
exec | x |
In theory this is where command line executable scrips reside |
inst/templates/rmarkdown | x |
Planned location for R Markdown Templates |
man | x |
Generated with roxygen2 |
package.Rproj | I use the same name for the package, .Rproj, directory, and GitHub repo | |
pkgdown | I store all pkgdown files here | |
po | x |
Used for Internationalization |
renv | Used by renv | |
renv.lock | The renv lockfile | |
tests | x |
Tests using testthat |
tools | x |
I use tools/ for shell scripts that support development, like setup-r |
vignettes | x |
More typical package articles, used by pkgdown |
R Workflow
Here is the typical workflow I’m settling into (or at least trying to…I still don’t have TDD down just yet), once a project is created. Projects are either vanilla packages and don’t contain analysis/
, like rdev, or “analysis packages” and bundle analysis notebooks as a project, like rtraining.
- Check for updated packages when starting to work, (I created
check_renv()
for this) and check for errors using local CI checks (ci()
). - When creating a new function, write the documentation first, using Roxygen - this helps encourage up-front design and clarifies goals/requirements.
- Write tests next - both happy path and negative test cases whenever possible. 100% test coverage is overkill, but I try to write a ‘data validity checker’ which also helps define the expected format of the data.
- TDD: run tests, write code that fails tests, fix code, repeat. (Don’t forget to refactor)
- I use trunk-based development, which I learned from Homebrew. I try to keep commits small, related, and implementing changes that don’t break code, merge back to main frequently - before the end of the day - and require linear commit history (ie rebase and merge).
- Before pushing commits, I run
style_all()
andci()
to fix any problems locally (just “undo” in GitHub Desktop, fix, and re-commit).
One thing that annoys me is that by default, devtools just writes new lines to the end of .Rbuildignore
, so I wrote sort_rbuildignore()
.
Next Steps
I’ll keep adding to this document as I go, and will likely eventually migrate this notebook to a vignette
and switch to pkgdown
.