Off-label uses in ggplot2
This article is originally published at https://www.tidyverse.org/blog/
ggplot2 v3.3.4 landed on CRAN recently, and while every release of ggplot2 is cause for celebration, this was merely a patch release fixing a large number of bugs and so it came and went without much fanfare. However, for a couple of users this release brought an unwelcome and surprising change. We feel that this is a great opportunity to talk a bit about some of the topics that Hadley discussed in his rstudio::global(2021) keynote, particularly the nature of breaking changes.
ggsave() as an easy way to save a ggplot object to an image file, using the following API:
ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) ggsave("my_mpg_plot.png")
ggsave() is designed so that it automatically picks up the last created (or rendered) plot, and coupled with automatic graphic device selection determined from the file extension it provides a very lean API.
The issue we will discuss in this blog post revolves around the use of
ggsave() in the following manner:
Now, if this is the first time you’ve seen
ggsave() being added to a plot, you are not alone. This certainly caught us by surprise. Prior to v3.3.4, this actually worked (more on that later) but with the recent release running this code will result in the following error:
Error: Can't add [`ggsave("my_mpg_plot.png")`](https://ggplot2.tidyverse.org/reference/ggsave.html) to a ggplot object.
If you were a user that had used this pattern for saving plots it very much felt like we had removed a feature, pulling the rug out from under your script with no warning. However, this use of
ggsave() had never been advertised in any of the documentation and while it worked, it could not be considered a feature as such.
We believe that this usage of
ggsave() is the off-label use that Hadley talks about in his keynote. Off-label use of functions comprise of using functions in a way that only work by accident, and are thus susceptible to breakage at any point due to changes in the code. Another common word for this is “a hack”, but this term can often imply that the user is full aware of the brittle nature of the setup. Off-label use can just as well be passed on between users to a point where some thinks that this is the correct, supported, way of doing things (this was certainly the case with the above issue).
In an age of the pipe it is easy to understand why this use was picked up and thought off as a real feature.
+, however, is not
|>). It is a compositional operator meant to assemble the description of a plot. There is no execution of logic (besides the assembly) going on, and thus the idea of adding
ggsave() does not make theoretical nor practical sense. This is also the reason why we do not want to “fix” this issue and turn it into a regular feature.
For those interested in the cause of both the accidental functionality and its breakage, here follows a description.
ggsave() can be used to save any plot object but defaults to the object returned by
ggplot2::last_plot(). This function returns the last rendered or modified plot object. That means that whenever you add something to a plot the result will be retrievable with
last_plot() but only until you manipulate or render another plot. What happens when adding
ggsave() to a plot is that all the additions are resolved from the left and at each point the result is pushed to the
last_plot() store. When it comes to the
ggsave() term, it will evaluate it and add the result to the plot. Since the expected plot is present in the
last_plot() store the evaluation of
ggsave() will proceed as expected. Prior to ggplot2 v3.3.4
NULL which, when added to a ggplot object is a no-op (i.e. it does nothing). The change that provoked the error is that with v3.3.4
ggsave() now returns the path to the saved file invisibly, and adding a string to a plot object is an error.
Based on this understanding there are some interesting observations we can make: First, while you’ll get an error in v3.3.4, the plot is actually saved to a file since the error is thrown after the evaluation of
ggsave(). This means that you can “fix” your code by putting the whole expression in a
try() block (please don’t do this though ????):
Another tidbit is that the perceived feature was extremely brittle, even when it worked. Consider the following code:
If you assumed that
ggsave() could be added to a plot you’d expect the above to be totally valid code and that
scatterplot.png would contain the plot from
barplot.png would contain the plot from
p2. However, since
ggsave() just fetched the last modified or rendered plot by default, both png files would be identical and contain the barplot in
In the end this short post is not intended to shame the users who used
ggsave() in an unsupported way. ggplot2 is such a huge package that it is easy to pick up usage patterns without ever thinking about whether it is the correct way - if it works it works. Instead, this post is meant to showcase how, even with rigorous testing and no breaking changes, an update can break someones workflow, often to the surprise of the developer. Once a package becomes popular enough, even the slightest change in the code have the capacity for disruption.
Please visit source website for post related comments.