This article is originally published at https://www.tidyverse.org/blog/
We’re chuffed to announce the release of dbplyr 2.3.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.
You can install it from CRAN with:
This post will highlight some of the most important new features in 2.3.0: eliminating subqueries for many verb combinations, better errors, and a handful of new translations. As usual, this release comes with a large number of improvements to translations for individual backends and you can see the full list in the release notes
dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combinations of verbs no longer require subqueries:
filter()now translates to
Here are a couple of examples of queries that are now much more compact:
lf1 <- lazy_frame(x = 1, a = "a", .name = "lf1") lf2 <- lazy_frame(x = 1, b = "b", .name = "lf2") lf3 <- lazy_frame(x = 1, c = "c", .name = "lf3") lf1 |> left_join(lf2, by = "x") |> left_join(lf3, by = "x") |> select(b, c) #> <SQL> #> SELECT `b`, `c` #> FROM `lf1` #> LEFT JOIN `lf2` #> ON (`lf1`.`x` = `lf2`.`x`) #> LEFT JOIN `lf3` #> ON (`lf1`.`x` = `lf3`.`x`) lf1 |> group_by(x) |> summarise(a = mean(a, na.rm = TRUE), n = n()) |> filter(n > 5) #> <SQL> #> SELECT `x`, AVG(`a`) AS `a`, COUNT(*) AS `n` #> FROM `lf1` #> GROUP BY `x` #> HAVING (COUNT(*) > 5.0)
(As ususal in these blog posts, I’m using
lazy_frame() to focus on the SQL generation, without having to set up a dummy database.)
Additionally, where possible, dbplyr now uses
SELECT * after a join instead of explicitly selecting every column.
Variables that aren’t found in either the data or in the environment now produce an error:
lf <- lazy_frame(x = 1,y = 2) lf |> mutate(x = z + 1) #> Error in `mutate()`: #> ! Problem while computing `x = z + 1` #> Caused by error: #> ! Object `z` not found.
(Previously they were silently translated to SQL variables.)
We’ve also generally reviewed the error messages to ensure they show more clearly where the error happened:
lf |> mutate(x = y %/% 1) #> Error in `purrr::pmap()` at dbplyr/R/lazy-select-query.R:282:2: #> ℹ In index: 1. #> ℹ With name: x. #> Caused by error in `y %/% 1`: #> ! %/% is not available in this SQL variant lf |> mutate(across(x:y, "a")) #> Error in `mutate()`: #> ! Problem while computing `..1 = across(x:y, "a")` #> Caused by error in `across()`: #> ! `.fns` must be a NULL, a function, formula, or list
stringr::str_like() (new in stringr 1.5.0) is translated to
dbplyr 2.3.0 is also supports features coming in dplyr 1.1.0:
.byargument is supported as alternative to
across()is deprecated because the evaluation timing of
case_match()functions are translated.
case_when()now supports the
This version does not support the new
join_by() syntax, but we’re working on it, and we’ll release an update after dplyr 1.1.0 is out.
The vast majority of this release (particularly the SQL optimisations) are from Maximilian Girlich; thanks so much for your continued work on this package.
We’d also like to thank all 74 contributors who help in someway, whether it was filing issues or contributing code and documentation: @a4sberg, @ablack3, @akgold, @aleighbrown, @andreassoteriadesmoj, @apalacio9502, @baileych, @barnesparker, @bhuvanesh1707, @bkraft4257, @bobbymc0, @brian-law-rstudio, @bthe, @But2ene, @capitantyler, @carlganz, @cboettig, @chwpearse, @copernican, @DSLituiev, @ehudtr7, @eitsupi, @ejneer, @eutwt, @ewright-vcan, @fabkury, @fh-afrachioni, @fh-mthomson, @filipemsc, @gadenbuie, @gbouzill, @giocomai, @hadley, @hershelm, @iangow, @iMissile, @IndrajeetPatil, @j-wester, @Janlow, @jasonmhoule, @jensmassberg, @jmbarbone, @joe-rodd, @kongdd, @krlmlr, @lschneiderbauer, @machow, @mgarbuzov, @mgirlich, @MichaelChirico, @moodymudskipper, @multimeric, @namarkus, @noamross, @NZambranoc, @oriolarques, @overmar, @owenjonesuob, @p-schaefer, @rohitg33, @rowrowrowyourboat, @rsund, @samssann, @samterfa, @schradj, @scvail195, @slhck, @splaisan, @stephenashton-dhsc, @ThomasMorland, @thothal, @viswaduttp, @XoliloX, and @yuhenghuang.
Please visit source website for post related comments.