R News

dplyr 0.6.0 coming soon!

by RStudio | Open source & professional software for data science teams on RStudio · April 13, 2017

This article is originally published at https://www.rstudio.com/blog/

I’m planning to submit dplyr 0.6.0 to CRAN on May 11 (in four weeks time). In preparation, I’d like to announce that the release candidate, dplyr 0.5.0.9002 is now available. I would really appreciate it if you’d try it out and report any problems. This will ensure that the official release has as few bugs as possible.

Installation

Install the pre-release version with:

# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

If you discover any problems, please file a minimal reprex on GitHub. You can roll back to the released version with:

install.packages("dplyr")

Features

dplyr 0.6.0 is a major release including over 100 bug fixes and improvements. There are three big changes that I want to touch on here:

Databases
Improved encoding support (particularly for CJK on windows)
Tidyeval, a new framework for programming with dplyr

You can see a complete list of changes in the draft release notes.

Databases

Almost all database related code has been moved out of dplyr and into a new package, dbplyr. This makes dplyr simpler, and will make it easier to release fixes for bugs that only affect databases.

To install the development version of dbplyr so you can try it out, run:

devtools::install_github("hadley/dbplyr")

There’s one major change, as well as a whole heap of bug fixes and minor improvements. It is now no longer necessary to create a remote “src”. Instead you can work directly with the database connection returned by DBI, reflecting the robustness of the DBI ecosystem. Thanks largely to the work of Kirill Muller (funded by the R Consortium) DBI backends are now much more consistent, comprehensive, and easier to use. That means that there’s no longer a need for a layer between you and DBI.

You can continue to use src_mysql(), src_postgres(), and src_sqlite() (which still live in dplyr), but I recommend a new style that makes the connection to DBI more clear:

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "iris", iris)
#> [1] TRUE

iris2 <- tbl(con, "iris")
iris2
#> Source:     table<iris> [?? x 5]
#> Database:   sqlite 3.11.1 [:memory:]
#>
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl>   <chr>
#> 1           5.1         3.5          1.4         0.2  setosa
#> 2           4.9         3.0          1.4         0.2  setosa
#> 3           4.7         3.2          1.3         0.2  setosa
#> 4           4.6         3.1          1.5         0.2  setosa
#> 5           5.0         3.6          1.4         0.2  setosa
#> 6           5.4         3.9          1.7         0.4  setosa
#> 7           4.6         3.4          1.4         0.3  setosa
#> 8           5.0         3.4          1.5         0.2  setosa
#> 9           4.4         2.9          1.4         0.2  setosa
#> 10          4.9         3.1          1.5         0.1  setosa
#> # ... with more rows

This is particularly useful if you want to perform non-SELECT queries as you can do whatever you want with DBI::dbGetQuery() and DBI::dbExecute().

If you’ve implemented a database backend for dplyr, please read the backend news to see what’s changed from your perspective (not much). If you want to ensure your package works with both the current and previous version of dplyr, see wrap_dbplyr_obj() for helpers.

Character encoding

We have done a lot of work to ensure that dplyr works with encodings other that Latin1 on Windows. This is most likely to affect you if you work with data that contains Chinese, Japanese, or Korean (CJK) characters. dplyr should now just work with such data.

Tidyeval

dplyr has a new approach to non-standard evaluation (NSE) called tidyeval. Tidyeval is described in detail in a new vignette about programming with dplyr but, in brief, it gives you the ability to interpolate values in contexts where dplyr usually works with expressions:

my_var <- quo(homeworld)

starwars %>%
  group_by(!!my_var) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)
#> # A tibble: 49 × 3
#>         homeworld   height  mass
#>             <chr>    <dbl> <dbl>
#> 1        Alderaan 176.3333  64.0
#> 2     Aleen Minor  79.0000  15.0
#> 3          Bespin 175.0000  79.0
#> 4      Bestine IV 180.0000 110.0
#> 5  Cato Neimoidia 191.0000  90.0
#> 6           Cerea 198.0000  82.0
#> 7        Champala 196.0000   NaN
#> 8       Chandrila 150.0000   NaN
#> 9    Concord Dawn 183.0000  79.0
#> 10       Corellia 175.0000  78.5
#> # ... with 39 more rows

This will make it much easier to eliminate copy-and-pasted dplyr code by extracting repeated code into a function.

This also means that the underscored version of each main verb (filter_(), select_() etc). is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility).

Thanks for visiting r-craft.org
This article is originally published at https://www.rstudio.com/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

dplyr 0.6.0 coming soon!

You may also like...

Categories

dplyr 0.6.0 coming soon!

Installation

Features

Databases

Character encoding

Tidyeval

You may also like...

R Weekly 2020-12 OncoTree, {parzer}, Flatten the Curve

Advent of 2020, Day 1 – What is Azure DataBricks

Parallel Processing Baseball Data with R and mlbgameday

Categories