R Setup Log

notes
Author

John Benninghoff

Published

December 30, 2020

Modified

February 24, 2024

My notes on my personal R setup. I started my R journey in September 2020 after SIRACon 2020. Important: this historical setup is outdated and differs from my current approach, which is documented in rdev.

# no libraries

Installing R

I prefer installing R and RStudio using Homebrew. I use Homebrew Bundle to install all software on my systems, but installing R and RStudio is simple enough using the command line:

brew install --formula r && brew install --cask rstudio

R Variants

TL;DR: just use homebrew.

I don’t recommend installing the cask variant of r because it creates problems with brew doctor:

brew info --cask r
==> r: 4.4.0
https://www.r-project.org/
Not installed
From: https://github.com/Homebrew/homebrew-cask/blob/HEAD/Casks/r/r.rb
==> Name
R
==> Description
Environment for statistical computing and graphics
==> Artifacts
R-4.4.0-arm64.pkg (Pkg)
==> Analytics
install: 630 (30 days), 2,111 (90 days), 7,994 (365 days)

I tried “installing all the things” with homebrew-r-srf but didn’t like how the uninstall doesn’t clean up, and frankly, I haven’t found a need for more than the capabilities included with homebrew R:

capabilities()
       jpeg         png        tiff       tcltk         X11        aqua 
       TRUE        TRUE        TRUE        TRUE        TRUE        TRUE 
   http/ftp     sockets      libxml        fifo      cledit       iconv 
       TRUE        TRUE       FALSE        TRUE       FALSE        TRUE 
        NLS       Rprof     profmem       cairo         ICU long.double 
       TRUE        TRUE        TRUE        TRUE        TRUE       FALSE 
    libcurl 
       TRUE 

Currently, this is everything except the X11 dependencies, which aren’t really needed:

?capabilities

Note to macOS users

Capabilities “jpeg”, “png” and “tiff” refer to the X11-based versions of these devices. If capabilities(“aqua”) is true, then these devices with type = “quartz” will be available, and out-of-the-box will be the default type. Thus for example the tiff device will be available if capabilities(“aqua”) || capabilities(“tiff”) if the defaults are unchanged.

As an active contributor to Homebrew, I strongly recommend using it for any/all software; their quality control is excellent.

R Workspace

Once R and RStudio have been installed, you’ll need a development environment and some packages.

Development Environment

Well, obviously, I use RStudio. There are other IDEs available, and it’s also possible to develop using the command line, but RStudio provides a pleasant, integrated experience for R development, and actively supports the R community. I don’t typically use the built-in Git client, instead using the GitHub Desktop client and git command-line tool.

brew install github

I also occasionally use vim (mainly for shell scripts) and atom (mainly for vanilla .md files using markdown-preview-enhanced).

Packages

Managing packages and environments are a challenge for most modern languages. Thankfully R doesn’t have the same level of challenge as python, or even ruby, managing packages available within a project is a best practice. I use renv for this purpose. (I originally discovered packrat but quickly discovered RStudio is replacing it with renv)

Here is my package maintenance approach:

Base R Packages

With no projects open, I periodically check for updates to the base R packages using the RStudio Packages “Update” function. These are the only packages I install to the base directory /usr/local/Cellar/r/4.0.3_2/lib/R/library. Note: upgrading or reinstalling r through homebrew may ‘downgrade’ the base packages.

Development Packages

The only other tools I install outside of projects are for supporting development. Originally this was just renv, but now it includes a number of development tools needed to create projects. I use the R site library used by homebrew, currently /usr/local/lib/R/4.0/site-library. I’ve developed a shell script to install development packages that you can find in the tools directory of this repository.

#!/bin/sh
# install development packages to site repository
# thanks to https://blog.sellorm.com/2017/10/21/quick-script-to-install-an-r-package-from-the-command-line/
# and https://github.com/Homebrew/homebrew-core/blob/master/Formula/r/r.rb
# designed to work with homebrew: `brew install --formula r && brew install --cask rstudio`
set -ex # halt script on error, echo on

PREFIX=`brew --prefix`
RVERSION=`${PREFIX}/bin/Rscript -e 'cat(as.character(getRversion()[1,1:2]))'`
SITELIB="${PREFIX}/lib/R/${RVERSION}/site-library"
DEVPKG='c("renv", "styler", "lintr", "miniUI", "devtools", "available")'

if [ ! -d "${SITELIB}" ]
then
    echo "fatal error: ${SITELIB} does not exist - not using \`brew install --formula r\`?"
    exit 1
fi

brew install libgit2 # required by devtools

echo "install.packages(${DEVPKG}, repos=\"https://cran.rstudio.com\", lib=\"${SITELIB}\")" | R --no-save

Project Packages

Any packages not needed to create projects I install within a project using renv - here are some good intros:

renv::init() # to set up renv in the project
renv::install("rtraining") # to install a package
renv::status() # checks if renv.lock is in sync
renv::snapshot() # save libraries in renv.lock
renv::restore() # restore libraries from renv.lock - use when first using an existing project
renv::clean() # remove extra libraries
renv::update() # check for unpdated versions of installed libraries

If you’re installing (development) versions of packages from GitHub, you’ll be prompted to set up a personal access token using create_github_token() and adding it to your .Renviron as GITHUB_PAT= using usethis::edit_r_environ(). I had to install the development versions of both styler and lintr due to bugs not yet fixed in the released (CRAN) versions. I did also look at using pak but found it too buggy to use.

Note: I came across an odd quirk where renv will prompt you to upgrade base R projects, even if they’re not used. I resolved these by just upgrading the base packages with RStudio with no project open.

I’ve included some comments on useful packages in my R Training Log notebook.

Using R

So - what have I learned about using R? I won’t cover the actual Data Science part here, but have some recommended reading in my R Training Log.

Use git!

This wasn’t really something I learned with R, but use of version control for any kind of code/script is crucial. These days, I keep all scripts and configuration files in some flavor of public or private version control. Mostly that means GitHub, but also private git servers (via ssh) are easy to set up for work that you want to keep off of GitHub.

I don’t want to get too much into managing code via version control, but I favor trunk-based development with short-lived branches, small commits of working code, and rebase and merge for linear commit history. There’s a lot of good research on this topic from Google’s Dev Ops Research and Assessment team.

Using RStudio with GitHub

RStudio has good integration with GitHub. I’ve adopted the convention of “one RStudio Project (.Rproj) per repository” and storing the Rproj file in the repository. That seems to be the norm.

R Notebooks

R Notebooks are my preferred file format for data analysis, as they allow an easy mixing of text (using pandoc markdown) and R code chunks. This allows me to document what I’m doing as I go, both for reproducibility as well as recording my observations, thoughts and conclusions. It especially lends itself to iterative development:

  1. Write code
  2. Run code
  3. See results
  4. Update code

This method of development is not always appropriate, but fits well with exploratory analysis. Once I get to writing functions, I’m starting to adopt the formal structure of packages, testthat, and roxygen2.

Specifically, I use html_notebook, which was recommended to me. RStudio handles this differently than html_document:

Behavior html_notebook html_document
Quick View Preview Button No Preview Button
Saving Files Automatically knits file on save Manually knit file
Rendering Files Renders R output as it exists in the IDE on save Renders R output by running all R code
Default Options Includes options for readability, hiding and downloading code, and paged tables Higher-quality rendering of PNG graphics

Essentially, R Notebooks are generally faster and more convenient when doing analysis, and html_document R Markdown files offer higher-quality output. This site uses a custom framework to convert html_notebooks to html_documents for publishing (see the next section for details). They are also a convenient way to share analysis with peers - just email the .nb.html file, which will include all of the output, as well as embedding the .Rmd source code for easy editing. This also allows people who don’t have R/RStudio to see the results of an analysis.

There are some drawbacks to using R Notebooks:

  • Because R Notebooks are render-on-save, you can inadvertently end up with missing or outdated R output from your notebook when saving, if you’ve made updates and haven’t re-run the entire document. My habit is to do the following at the end of a writing session, before committing to git, which ensures a “clean” notebook:
    1. Clear the Global Environment
    2. “Restart R and Clear Output”
    3. “Run All”
    4. Save
  • Debug breakpoints don’t work in R Markdown documents. To fix this, convert R Markdown documents to R Scripts using purl for debugging.
  • Not really a drawback, but…Rcpp and rprojroot are erroneously listed in RStudio as required to create R Markdown, which can also cause problems with renv. This is a bug in RStudio, which will be fixed in version 1.4.944.

Publishing R Notebooks

Since R Notebooks are saved as html files, it’s possible to publish them on GitHub using GitHub pages. GitHub published a tutorial in 2018 on getting RStudio integrated with GitHub, and I started working on that. Quickly I discovered that while the tutorial was helpful, it wasn’t quite the setup I wanted; it published R Markdown through GitHub pages, but wouldn’t directly support the automatically generated html of R Notebooks. After more searching, I was able to get Notebooks working on GitHub, but I used the method described in rstudio/rmarkdown #1020 - checking in the .nb.html into git, and using GitHub Pages so that you can view the rendered HTML instead of just the HTML code.

Publishing with rmarkdown

After using README.md and GitHub pages to publish notebooks, I found I wanted an easier way to publish and navigate across collections of notebooks. Publishing R Notebooks on GitHub pages works fine, but doesn’t offer an easy navigation structure, like pkgdown. I tried using pkgdown to display notebooks, but pkgdown only supports building vignettes, which have a distinctly different look and feel than R Notebooks. The rmarkdown site generator, render_site, allows more flexible building of websites from R Markdown files with a simple navigation bar at the top, but doesn’t support html_notebook files OOB.

To work around this, I created a simple framework for converting html_notebook files to html_document and building a _site.yml from a list of notebooks stored in the non-standard notebooks/ directory, initially using a shell script, build_site (stored at the root of this repo). It does the following:

  1. Creates a working directory, .build-site
  2. Builds a rmarkdown::render_site() _site.yml file that includes a menu with all notebooks in the notebooks/ directory
  3. Copies all .Rmd files to .build-site changing their type from html_notebook to html_document
  4. Includes some configuration to make html_documents work more like html_notebook
  5. Calls rmarkdown::render_site() to render the site in the docs/ directory

I typically rebuild a site with the following command, run from the top-level directory of the project:

rm -rf .build-site && sh build-site && open docs/index.html

This approach leverages the docs functionality of GitHub pages, like pkgdown.

Update on Publishing

build-site has been replaced with build_analysis_site()! At its core, it is functionally the same: builds a pkgdown site, adding an “Analysis” menu with all notebooks in the (renamed) analysis/ directory, then converts and builds the notebooks using rmarkdown, and moves them into the docs/ directory. Since it is now an R Script, it’s more portable and can be more easily bundled with packages. I will be migrating its functionality to rdev shortly, so that it’s usable across multiple analysis projects.

R Package Layout

Here is my package layout - the table shows the path, whether it’s part of the formal R package definition, and my notes on its use.

Path R Notes
.Rbuildignore x Exclude files from package
.Rprofile x Used by renv and to attach development tools
.github GitHub templates and workflows
.gitignore x I use R and macOS exclusions, and always exclude generated files outside of docs/
.lintr lintr default linters with 100 columns: linters: with_defaults(line_length_linter(100))
DESCRIPTION x use “Suggests” for development tools, per renv
LICENSE x Generated with use_mit_license()
LICENSE.md See above, used by pkgdown
NAMESPACE x Generated with roxygen2
NEWS.md Release notes, used by pkgdown
README.Rmd x Generated with use_readme_md()
README.md x Generated with build_readme()
R x All project functions go here, with roxygen2 comments
TODO.md To-do list, inspired by renv’s historical TODO.md
analysis Exploratory data analysis in R Notebooks and R Presentations
analysis/data When appropriate, analysis data lives here
analysis/assets External assets (images, other files) included in R Notebooks
analysis/rendered Manually rendered html versions of analysis/ files to be included in docs/, ie .Rpres files, not stored in git
demos x I don’t use demos, as recommended by R Packages
docs Used by pkgdown::build_site() and R Notebooks rendered as html_document using rmarkdown::render_site() via build_analysis_site()
exec x In theory this is where command line executable scrips reside
inst/templates/rmarkdown x Planned location for R Markdown Templates
man x Generated with roxygen2
package.Rproj I use the same name for the package, .Rproj, directory, and GitHub repo
pkgdown I store all pkgdown files here
po x Used for Internationalization
renv Used by renv
renv.lock The renv lockfile
tests x Tests using testthat
tools x I use tools/ for shell scripts that support development, like setup-r
vignettes x More typical package articles, used by pkgdown

R Workflow

Here is the typical workflow I’m settling into (or at least trying to…I still don’t have TDD down just yet), once a project is created. Projects are either vanilla packages and don’t contain analysis/, like rdev, or “analysis packages” and bundle analysis notebooks as a project, like rtraining.

  1. Check for updated packages when starting to work, (I created check_renv() for this) and check for errors using local CI checks (ci()).
  2. When creating a new function, write the documentation first, using Roxygen - this helps encourage up-front design and clarifies goals/requirements.
  3. Write tests next - both happy path and negative test cases whenever possible. 100% test coverage is overkill, but I try to write a ‘data validity checker’ which also helps define the expected format of the data.
  4. TDD: run tests, write code that fails tests, fix code, repeat. (Don’t forget to refactor)
  5. I use trunk-based development, which I learned from Homebrew. I try to keep commits small, related, and implementing changes that don’t break code, merge back to main frequently - before the end of the day - and require linear commit history (ie rebase and merge).
  6. Before pushing commits, I run style_all() and ci() to fix any problems locally (just “undo” in GitHub Desktop, fix, and re-commit).

One thing that annoys me is that by default, devtools just writes new lines to the end of .Rbuildignore, so I wrote sort_rbuildignore().

Next Steps

I’ll keep adding to this document as I go, and will likely eventually migrate this notebook to a vignette and switch to pkgdown.