Idle thoughts lead to R internals: how to count function arguments
This article is originally published at https://nsaunders.wordpress.com
“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?”
It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there.
There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they?
What packages load on starting R?
Start a new R session and type
search(). Here’s the result on my machine:
 ".GlobalEnv" "tools:rstudio" "package:stats" "package:graphics" "package:grDevices"
"package:utils" "package:datasets" "package:methods" "Autoloads" "package:base"
We’re interested in the packages with priority = base. Next question:
How can I see and filter for package priority?
You don’t need
dplyr for this, but it helps.
library(tidyverse) installed.packages() %>% as.tibble() %>% filter(Priority == "base") %>% select(Package, Priority) # A tibble: 14 x 2 Package Priority <chr> <chr> 1 base base 2 compiler base 3 datasets base 4 graphics base 5 grDevices base 6 grid base 7 methods base 8 parallel base 9 splines base 10 stats base 11 stats4 base 12 tcltk base 13 tools base 14 utils base
Comparing to the output from
search(), we want to look at: stats, graphics, grDevices, utils, datasets, methods and base.
How can I see all the objects in a package?
Like this, for the base package. For other packages, just change base to the package name of interest.
However, not every object in a package is a function. Next question:
How do I know if an object is a function?
The simplest way is to use is.function().
is.function(ls)  TRUE
What if the function name is stored as a character variable, “ls”? Then we can use
is.function(get("ls"))  TRUE
But wait: what if two functions from different packages have the same name and we have loaded both of those packages? Then we specify the package too, using the pos argument.
is.function(get("Position", pos = "package:base"))  TRUE is.function(get("Position", pos = "package:ggplot2"))  FALSE
So far, so good. Now, to the arguments.
How do I see the arguments to a function?
Now things start to get interesting. In R, function arguments are called formals. There is a function of the same name,
formals(), to show the arguments for a function. You can also use
formalArgs() which returns a vector with just the argument names:
formalArgs(ls)  "name" "pos" "envir" "all.names" "pattern" "sorted"
But that won’t work for every function. Let’s try
The issue here is that
abs() is a primitive function, and primitives don’t have formals. Our next two questions:
How do I know if an object is a primitive?
Hopefully you guessed that one:
is.primitive(abs)  TRUE
How do I see the arguments to a primitive?
You can use
args(), and you can pass the output of
args(abs) function (x) NULL formalArgs(args(abs))  "x"
However, there are a few objects which are primitive functions for which this doesn’t work. Let’s not worry about those.
is.primitive(`:`)  TRUE formalArgs(args(`:`)) NULL Warning message: In formals(fun) : argument is not a function
So what was the original question again?
Let’s put all that together. We want to find the base packages which load on startup, list their objects, identify which are functions or primitive functions, list their arguments and count them up.
We’ll create a tibble by pasting the arguments for each function into a comma-separated string, then pulling the string apart using
unnest_tokens() from the tidytext package.
library(tidytext) library(tidyverse) pkgs <- installed.packages() %>% as.tibble() %>% filter(Priority == "base", Package %in% c("stats", "graphics", "grDevices", "utils", "datasets", "methods", "base")) %>% select(Package) %>% rowwise() %>% mutate(fnames = paste(ls(paste0("package:", Package)), collapse = ",")) %>% unnest_tokens(fname, fnames, token = stringr::str_split, pattern = ",", to_lower = FALSE) %>% filter(is.function(get(fname, pos = paste0("package:", Package)))) %>% mutate(is_primitive = ifelse(is.primitive(get(fname, pos = paste0("package:", Package))), 1, 0), num_args = ifelse(is.primitive(get(fname, pos = paste0("package:", Package))), length(formalArgs(args(fname))), length(formalArgs(fname)))) %>% ungroup()
That throws out a few warnings where, as noted,
args() doesn’t work for some primitives.
And the winner is –
pkgs %>% top_n(10) %>% arrange(desc(num_args)) Selecting by num_args # A tibble: 10 x 4 Package fname is_primitive num_args <chr> <chr> <dbl> <int> 1 graphics legend 0 39 2 graphics stars 0 33 3 graphics barplot.default 0 30 4 stats termplot 0 28 5 utils read.table 0 25 6 stats heatmap 0 24 7 base scan 0 22 8 graphics filled.contour 0 21 9 graphics hist.default 0 21 10 stats interaction.plot 0 21
– the function
legend() from the graphics package, with 39 arguments. From the base package itself,
scan(), with 22 arguments.
Just to wrap up, some histograms of argument number by package, suggesting that the base graphics functions tend to be the more verbose.
pkgs %>% ggplot(aes(num_args)) + geom_histogram() + facet_wrap(~Package, scales = "free_y") + theme_bw() + labs(x = "arguments", title = "R base function arguments by package")
Please visit source website for post related comments.