This article is originally published at https://www.tidyverse.org/blog/
We’re chuffed to announce the release of dbplyr 2.2.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.
You can install it from CRAN with:
This blog post will discuss some of the biggest improvements to SQL translations, introduce
copy_inline(), and discuss support for dplyr’s
row_ functions. You can see a full list of changes in the
This release brings with it a host of useful improvements to SQL generation. Firstly, dbplyr uses
* where possible. This is particularly nice when you have a table with many names:
If you’re familiar with dbplyr’s old SQL output, you’ll also notice that the output receives some basic syntax highlighting and much improved line breaks and indenting.
The use of
* is particularly nice when you have a subquery. Previously the generated SQL would have repeated the column names
z twice, once for each subquery.
lf |> mutate(x2 = x + 1, x3 = x2 + 1) #> <SQL> #> SELECT *, `x2` + 1.0 AS `x3` #> FROM ( #> SELECT *, `x` + 1.0 AS `x2` #> FROM `df` #> ) `q01`
collect() have experimental support for common table expressions (CTEs), available by setting
cte = TRUE argument. CTEs are the database equivalent of the pipe; they allow you to write subqueries in the order in which they’re evaluated, rather than the opposite.
lf |> mutate(x2 = x + 1, x3 = x2 + 1) |> show_query(cte = TRUE) #> <SQL> #> WITH `q01` AS ( #> SELECT *, `x` + 1.0 AS `x2` #> FROM `df` #> ) #> SELECT *, `x2` + 1.0 AS `x3` #> FROM `q01`
We’ve also added support for translating
cut(): this is a very useful base R function that’s fiddly to express in SQL:
lf <- lazy_frame(x = 1) translate_sql( cut(x, c(0, 25, 50, 100)) ) #> <SQL> CASE #> WHEN (`x` <= 0.0) THEN NULL #> WHEN (`x` <= 25.0) THEN '(0,25]' #> WHEN (`x` <= 50.0) THEN '(25,50]' #> WHEN (`x` <= 100.0) THEN '(50,100]' #> WHEN (`x` > 100.0) THEN NULL #> END # Can provide custom labels translate_sql( cut(x, c(0, 25, 50, 100), labels = c("small", "medium", "large")) ) #> <SQL> CASE #> WHEN (`x` <= 0.0) THEN NULL #> WHEN (`x` <= 25.0) THEN 'small' #> WHEN (`x` <= 50.0) THEN 'medium' #> WHEN (`x` <= 100.0) THEN 'large' #> WHEN (`x` > 100.0) THEN NULL #> END # And use Inf/-Inf bounds translate_sql( cut( x, breaks = c(-Inf, 25, 50, Inf), labels = c("small", "medium", "large") ) ) #> <SQL> CASE #> WHEN (`x` <= 25.0) THEN 'small' #> WHEN (`x` <= 50.0) THEN 'medium' #> WHEN (`x` > 50.0) THEN 'large' #> END
There are also a whole host of minor translation improvements which you can read about in the release notes.
copy_inline() provides a new way to get data out of R and into the database by embedding the data directly in the query. This is a natural complement to
copy_to() which writes data to a temporary table.
copy_inline() is faster for small datasets and is particularly useful when you don’t have the permissions needed to create temporary tables. Here’s a very simple example of what the generated SQL will look like for PostgreSQL
df <- data.frame(x = 1:5, y = letters[1:5]) show_query(copy_inline(simulate_postgres(), df)) #> <SQL> #> SELECT CAST(`x` AS INTEGER) AS `x`, CAST(`y` AS TEXT) AS `y` #> FROM ( VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')) AS drvd(`x`, `y`)
dplyr 1.0.0 added a family of
row modification functions:
rows_delete(). These functions were inspired by SQL and are now supported by dbplyr.
The primary purpose of these functions is to modify the underlying tables. Because that purpose is dangerous, you’ll need to deliberate opt-in to modification by setting
in_place = TRUE. Use the default behaviour,
in_place = FALSE, to simulate what the result will be.
Most of the work in this release was done by dbplyr author @mgirlich: thanks for all your continued hard work!
And a big thanks to all 77 other contributors who’s comments, code, and discussion helped make a better package: @001ben, @1beb, @Ada-Nick, @admivsn, @alex-m-ffm, @andreassoteriadesmoj, @andyquinterom, @apalacio10, @apalacio9502, @aris-hastings, @asimumba, @ben1787, @boshek, @caljnj, @carlganz, @CLRafaelR, @coponhub, @cslewis04, @dbaston, @dpprdan, @DrFabach, @EarlGlynn, @edonnachie, @eipi10, @eitsupi, @fh-afrachioni, @fh-kpikhart, @ggpinto, @GuillaumePressiat, @hadley, @HarlanH, @hdplsa, @iangow, @James-G-Hill, @jennybc, @jiaqizhu-learning, @jonkeane, @jsspurgeon, @julieinsan, @k6adams, @kelnerrr, @kmishra9, @krlmlr, @Leprechault, @Liudvikas-vinted, @LukasWallrich, @m-sostero, @maelle, @mattcane, @mfherman, @mkoohafkan, @Mosk915, @nassuphis, @nirski, @nviets, @overmar, @p-schaefer, @plogacev, @randy3k, @recleev, @rmcd1024, @rsund, @rvomm, @samssann, @sfirke, @Sir-Chibi, @sitendug, @somatusag, @stephenashton-dhsc, @swnydick, @thothal, @torbjorn, @tsengj, @vspinu, @Waftmaster, @williamlai2, and @yitao-li.
Please visit source website for post related comments.