Some love for Base R. Part 1
This article is originally published at https://luis.apiolaza.net
For a long time it has bothered me when people look down at base-R
(meaning the set of functions that comes in a default installation), as it were a lesser version of the language when compared to the tidyverse
set of functions or data.table
or whatever. I think part of this situation is due to 1. much less abundant modern documentation and tutorials using base
and 2. the treatment of base
in those tutorials ignores base constructs that make it look, well, tidy.
This could be read as a curmudgeon complaining at clouds BUT there are occasions when relying on a minimal installation without any extra packages is quite useful. For example, when you want almost maintenance-free code between versions, base R is very stable. I have code that’s almost 20 years old and still running. As a researcher, this is a really good situation.
If you’re unconvinced, there is a much cooler application webr, or R in the browser. It happens to be that loading additional packages (particularly large ones, like the tidyverse
) makes the whole R in the browser proposition much slower. A toot (this was in Mastodon, by the way) by bOb Rudis (hrbrmstr) got me thinking that:
@hrbrmstr It would be cool to see a renaissance of base R, leading to a tidier, much more readable use with with(), within(), etc.
Luis

Perhaps an example will show better what I mean. Let’s assume we go back in time and you find 15-year-old code that looks like this:
setwd("/Users/luis/Documents/Research/2008/Glasshouse") gw <- read.csv("data20091111.csv", header = TRUE) gw$dratio <- gw$dia1.mm/gw$dia2.mm gw$dia <- ifelse(is.na(gw$dia2.mm), gw$dia1.mm, (gw$dia1.mm + gw$dia2.mm)/2) gw$slend <- gw$ht.cm/gw$dia gw$smoe <- 1.1*gw$v2104^2 # Squared standing velocity in April gw$gmoe <- 1.1*gw$vgreen^2 gw$dmoe <- (gw$bden/1000)*1.14*gw$vdry^2
Which, let’s face it, it doesn’t look pretty. If you were working in the tidyverse you’d probably also be using RStudio (and projects). If you are using projects, your code would look like:
library(readr) library(dplyr) gw <- read_csv("data20091111.csv") gw <- gw %>% mutate(dratio = dia1.mm/dia2.mm, dia = ifelse(is.na(dia2.mm), dia1.mm, (dia1.mm + dia2.mm)/2), slend = ht.cm / dia, smoe = 1.1 * v2104^2, gmoe = 1.1 * vgreen^2, dmoe = (bden / 1000) * 1.14 * vdry^2)
which is easier to read and follow in my opinion. However, there was nothing stopping you to write the following 15 years ago:
setwd("/Users/luis/Documents/Research/2008/Glasshouse") gw <- read.csv("data20091111.csv", header = TRUE) gw <- within(gw, { dratio <- dia1.mm/dia2.mm dia <- ifelse(is.na(dia2.mm), dia1.mm, (dia1.mm + dia2.mm)/2) slend <- ht.cm / dia smoe <- 1.1 * v2104^2 gmoe <- 1.1 * vgreen^2 dmoe <- (bden / 1000) * 1.14 * vdry^2 })
which is quite similar to the dplyr
code, but without any external package dependencies. Now, if you are missing the magrittr
pipes, from R 4.1.0 it is possible to use native pipes and write the above code as:
setwd("/Users/luis/Documents/Research/2008/Glasshouse") gw <- read.csv("data20091111.csv", header = TRUE) gw <- gw |> within({ dratio <- dia1.mm/dia2.mm dia <- ifelse(is.na(dia2.mm), dia1.mm, (dia1.mm + dia2.mm)/2) slend <- ht.cm / dia smoe <- 1.1 * v2104^2 gmoe <- 1.1 * vgreen^2 dmoe <- (bden / 1000) * 1.14 * vdry^2 })
which gets us even closer to the tidyverse
. The real magic is using within()
to specify inside which data frame we are looking for all the variables that are being referred by the calculations. This permits us writing dratio <- dia1.mm/dia2.mm
instead of gw$dratio <- gw$dia1.mm/gw$dia2.mm
. There are a few "tricks" like this to make base-R
a very attractive option, particularly if you like minimal, few dependencies coding.
Thanks for visiting r-craft.org
This article is originally published at https://luis.apiolaza.net
Please visit source website for post related comments.