Using renv in R

One good thing about learning R is that you do not need to think a lot about dependencies when installing R and different R packages. You simply install the R packages you need and start using them. If you are ever concerned about not using the most recent packages, you run update.packages(), and that is all there is to using up-to-date packages in R. It is that simple.

However, there is no such thing as a free lunch. If you use R long enough you will eventually encounter a script that no longer works as intended after you reinstalled R or updated a package. Or you might find a great R script in a repository only to find out that it is no longer working (even if you try to run it twice). If you, for example, try to run all scripts in my book introducing R, you will not get the exact same output (unless you install a specific version of R and the specific packages that were up-to-date at the time the book was published). It is a problem.

As for any problem, there are usually multiple ways we can try to address this. First, we can do a few things with the code itself. That is, we can make the code itself robust. Try to rely only on R packages with stable releases, and limit both the number of packages and their dependencies. If you use a new package that is not available on CRAN yet, the likelihood that your code will remain reproducible over time is much lower. If you only need to make a quick figure with {ggplot2}, only load {ggplot2} and not {tidyverse}. If you do not rely on any packages at all, the likelihood that your analysis will reproduce successfully in the future is very high.

Second, we can work with the context of the code. That is, we can make the environment in which we run the code robust. This can be done with Docker containers if we want to rely on a very robust approach. For some projects this might be an overkill and a good start to make our R environment reproducible is to try out the R package {renv}. I mentioned the package in a post last year with great R functions (a post that was mentioned recently in the great What’s New in R newsletter). There are noteworthy limitations to the package, but it is, again, a good start.

When you begin a new project, it is recommended to consider what you are starting from. For example, make sure that you are using the most recent version of R, so you are not already setting up an outdated project. If you have used R for years but never updated your version of R, this might be a good opportunity to update your workflow. Next, for {renv}, make sure the package is installed and up-to-date. When that is done, you can load the package using library("renv").

Next, you can use renv::init() to initialise a new project with {renv}. This will create a few things in your working directory. First, a file called renv.lock. Second, a folder called renv. You should not edit any of these on your own.

Now you can do your usual work. However, you are working in a local library in your project not connected to any of the R packages you have installed on your Mac or PC. This means you will need to (re)install the packages you now need within your environment. For example, you can run install.packages("dplyr") and install.packages("ggplot2") if you rely on {dplyr} and {ggplot2} for your project.

When you have installed the packages you need, you can run renv::status() to see the current status of your environment. You can now use the packages as you like. The idea here is to create a lockfile with the details on the packages we are using in our project (together with any dependencies required for those packages). In other words, we do not need to install all the packages that are part of your current installation of R.

To take a snapshot of the versions of the packages we rely on, we can use renv::snapshot(). When you make some changes to your setup, i.e., the packages you rely on, it is a good idea to run snapshot() to update the lockfile. Now we will be able to share these files with a team member and they will be able to create the same setup and, hopefully, get the exact same output/results.

If you later on find yourself on another machine, you can simply use renv::restore() to get the packages as they were when we set up the project. While not as perfect as running R in a container (e.g., using Docker), it is still a pretty good start to set up a robust workflow in R.