R News

dtplyr 1.3.0

by Posts | Tidyverse · February 24, 2023

This article is originally published at https://www.tidyverse.org/blog/

We’re thrilled to announce the release of dtplyr 1.3.0. dtplyr gives you the speed of data.table with the syntax of dplyr; you write dplyr (and tidyr) code and dtplyr translates it to the data.table equivalent.

You can install it from CRAN with:

install.packages("dtplyr")

This blog post will give you an overview of the changes in this version: dtplyr no longer adds translations directly to data.tables, it includes some dplyr 1.1.0 updates, and we have made some performance improvements. As always, you can see a full list of changes in the release notes

library(dtplyr)
library(dplyr, warn.conflicts = FALSE)

Breaking changes

In previous versions, dtplyr registered translations that kicked in whenever you used a data.table. This caused problems because merely loading dtplyr could cause otherwise ok code to fail because dplyr and tidyr functions would now return lazy_dt objects instead of data.table objects. To avoid this problem, we have removed those S3 methods so you must now explicitly opt-in to dtplyr translations by using lazy_dt().

dplyr 1.1.0

This release brings support for dplyr 1.1.0’s per-operation grouping and pick():

dt <- lazy_dt(data.frame(x = 1:10, id = 1:2))
dt |> 
  summarise(mean = mean(x), .by = id) |> 
  show_query()
#> `_DT1`[, .(mean = mean(x)), keyby = .(id)]

dt <- lazy_dt(data.frame(x = 1:10, y = runif(10)))
dt |> 
  mutate(row_sum = rowSums(pick(x))) |> 
  show_query()
#> copy(`_DT2`)[, `:=`(row_sum = rowSums(data.table(x = x)))]

Per-operation grouping was one of the dplyr 1.1.0 features inspired by data.table, so it’s neat to see it come full circle in this dtplyr release. Future releases will add support for other dplyr 1.1.0 features like the new join_by() syntax and reframe().

Improved translations

dtplyr gains new translations for add_count() and unite(), and the ranking functions, min_rank(), dense_rank(), percent_rank(), & cume_dist() are now mapped to their data.table equivalents:

dt |> add_count() |> show_query()
#> copy(`_DT2`)[, `:=`(n = .N)]

dt |> tidyr::unite("z", c(x, y)) |> show_query()
#> copy(`_DT2`)[, `:=`(z = paste(x, y, sep = "_"))][, `:=`(c("x", 
#> "y"), NULL)]

dt |> mutate(r = min_rank(x)) |> show_query()
#> copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "min", na.last = "keep"))]

dt |> mutate(r = dense_rank(x)) |> show_query()
#> copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "dense", na.last = "keep"))]

This release also includes three translation improvements that yield better performance. When data has previously been copied arrange() will use setorder() instead of order() and select() will drop unwanted columns by reference (i.e. with var := NULL). And slice() now uses an intermediate variable to reduce computation time of row selection.

Acknowledgements

A massive thanks to Mark Fairbanks who did most of the work for this release, ably aided by the other dtplyr maintainers @eutwt and Maximilian Girlich. And thanks to everyone else who helped make this release possible, whether it was with code, documentation, or insightful comments: @abalter, @akaviaLab, @camnesia, @caparks2, @DavisVaughan, @eipi10, @hadley, @jmbarbone, @johnF-moore, @lschneiderbauer, and @NicChr.

Thanks for visiting r-craft.org
This article is originally published at https://www.tidyverse.org/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

dtplyr 1.3.0

You may also like...

Categories

dtplyr 1.3.0

Breaking changes

dplyr 1.1.0

Improved translations

Acknowledgements

You may also like...

More flexible models with TensorFlow eager execution and Keras

FOCI: a new method for feature selection

Constructing Continuous Futures Price Series

Categories