R News

purrr 1.0.0

by Posts | Tidyverse · December 20, 2022

This article is originally published at https://www.tidyverse.org/blog/

We’re happy to announce the release of purrr 1.0.0! purrr enhances R’s functional programming toolkit by providing a complete and consistent set of tools for working with functions and vectors. In the words of ChatGPT:

With purrr, you can easily “kitten” your functions together to perform complex operations, “paws” for a moment to debug and troubleshoot your code, while “feline” good about the elegant and readable code that you write. Whether you’re a “cat”-egorical beginner or a seasoned functional programming “purr”-fessional, purrr has something to offer. So why not “pounce” on the opportunity to try it out and see how it can “meow”-velously improve your R coding experience?

You can install it from CRAN with:

install.packages("purrr")

purrr is 7 years old and it’s finally made it to 1.0.0! This is a big release, adding some long-needed functionality (like progress bars!) as well as really refining the core purpose of purrr. In this post, we’ll start with an overview of the breaking changes, then briefly review some documentation changes. Then we’ll get to the good stuff: improvements to the map family, new keep_at() and discard_at() functions, and improvements to flattening and simplification. You can see a full list of changes in the release notes.

library(purrr)

Breaking changes

We’ve used the 1.0.0 release as an opportunity to really refine the core purpose of purrr: facilitating functional programming in R. We’ve been more aggressive with deprecations and breaking changes than usual, because a 1.0.0 release signals that purrr is now stable, making it our last opportunity for major changes.

These changes will break some existing code, but we’ve done our best to make it affect as little code as possible. Out of the ~1400 CRAN packages that user purrr, only ~40 were negatively affected, and I made pull requests to fix them all. Making these fixes helped give me confidence that, though we’re deprecating quite a few functions and changing a few special cases, it shouldn’t affect too much code in the wild.

There are four important changes that you should be aware of:

pluck() behaves differently when extracting 0-length vectors.
The map() family uses the tidyverse rules for coercion and recycling.
All functions that modify lists handle NULL consistently.
We’ve deprecated functions that aren’t related to the core purpose of purrr.

`pluck()` and zero-length vectors

Previously, pluck() replaced 0-length vectors with the value of default. Now default is only used for NULLs and absent elements:

x <- list(y = list(a = character(), b = NULL))
x |> pluck("y", "a", .default = NA)
#> character(0)
x |> pluck("y", "b", .default = NA)
#> [1] NA
x |> pluck("y", "c", .default = NA)
#> [1] NA

This also influences the map family because using an integer vector, character vector, or list instead of a function automatically calls pluck():

x <- list(list(1), list(), list(NULL), list(character()))
x |> map(1, .default = 0) |> str()
#> List of 4
#>  $ : num 1
#>  $ : num 0
#>  $ : num 0
#>  $ : chr(0)

We made this change because it makes purrr more consistent with the rest of the tidyverse and it looks like it was a bug in the original implementation of the function.

Tidyverse consistency

We’ve tweaked the map family of functions to be more consistent with general tidyverse coercion and recycling rules, as implemented by the vctrs package. map_lgl(), map_int(), map_int(), and map_dbl() now follow the same coercion rules as vctrs. In particular:

map_chr(TRUE, identity), map_chr(0L, identity), and map_chr(1.5, identity) have been deprecated because we believe that converting a logical/integer/double to a character vector is potentially dangerous and should require an explicit coercion.

# previously you could write
map_chr(1:4, \(x) x + 1)
#> Warning: Automatic coercion from double to character was deprecated in purrr 1.0.0.
#> ℹ Please use an explicit call to `as.character()` within `map_chr()` instead.
#> [1] "2.000000" "3.000000" "4.000000" "5.000000"

# now you need something like this:
map_chr(1:4, \(x) as.character(x + 1))
#> [1] "2" "3" "4" "5"

map_int() requires that the numeric results be close to integers, rather than silently truncating to integers. Compare these two examples:

map_int(1:3, \(x) x / 2)
#> Error in `map_int()`:
#> ℹ In index: 1.
#> Caused by error:
#> ! Can't coerce from a double vector to an integer vector.

map_int(1:3, \(x) x * 2)
#> [1] 2 4 6

map2(), modify2(), and pmap() use tidyverse recycling rules, which mean that vectors of length 1 are recycled to any size but all other vectors must have the same length. This has two major changes:

Previously, the presence of a zero-length input generated a zero-length output. Now it’s recycled using the same rules:

map2(1:2, character(), paste)
#> Error in `map2()`:
#> ! Can't recycle `.x` (size 2) to match `.y` (size 0).

# Works because length-1 vector gets recycled to length-0
map2(1, character(), paste)
#> list()

And now must explicitly recycle vectors that aren’t length 1:

map2_int(1:4, c(10, 20), `+`)
#> Error in `map2_int()`:
#> ! Can't recycle `.x` (size 4) to match `.y` (size 2).

map2_int(1:4, rep(c(10, 20), 2), `+`)
#> [1] 11 22 13 24

Assigning `NULL`

purrr has a number of functions that modify a list: pluck<-(), assign_in(), modify(), modify2(), modify_if(), modify_at(), and list_modify(). Previously, these functions had inconsistent behaviour when you attempted to modify an element with NULL: some functions would delete that element, and some would set it to NULL. That inconsistency arose because base R handles NULL in different ways depending on whether or not use you $/[[ or [:

x1 <- x2 <- x3 <- list(a = 1, b = 2)

x1$a <- NULL
str(x1)
#> List of 1
#>  $ b: num 2

x2["a"] <- list(NULL)
str(x2)
#> List of 2
#>  $ a: NULL
#>  $ b: num 2

Now functions that edit a list will create an element containing NULL:

x3 |> 
  list_modify(a = NULL) |> 
  str()
#> List of 2
#>  $ a: NULL
#>  $ b: num 2

x3 |> 
  modify_at("b", \(x) NULL) |> 
  str()
#> List of 2
#>  $ a: num 1
#>  $ b: NULL

If you want to delete the element, you can use the special zap() sentinel:

x3 |> 
  list_modify(a = zap()) |> 
  str()
#> List of 1
#>  $ b: num 2

zap() does not work in modify*() because those functions are designed to always return the same top-level structure as the input.

We have deprecated a number of functions to keep purrr focused on its core purpose: facilitating functional programming in R. Deprecation means that the functions will continue to work, but you’ll be warned once every 8 hours if you use them. In several years time, we’ll release an update which causes the warnings to occur on every time you use them, and a few years after that they’ll be transformed to throwing errors.

cross() and all its variants have been deprecated because they’re slow and buggy, and a better approach already exists in tidyr::expand_grid().
update_list(), rerun(), and the use of tidyselect with map_at() and friends have been deprecated because we no longer believe that non-standard evaluation is a good fit for purrr.
The lift_* family of functions has been superseded because they promote a style of function manipulation that is not commonly used in R.
prepend(), rdunif(), rbernoulli(), when(), and list_along() have been deprecated because they’re not directly related to functional programming.
splice() has been deprecated because we no longer believe that automatic splicing makes for good UI and there are other ways to achieve the same result.

Consult the documentation for the alternatives that we now recommend.

Deprecating these functions makes purrr easier to maintain because it reduces the surface area for bugs and issues, and it makes purrr easier to learn because there’s a clearer common thread that ties together all functions.

Documentation

As you’ve seen in the code above, we are moving from magrittr’s pipe (%>%) to the base pipe (|>) and from formula syntax (~ .x + 1) to R’s new anonymous function short hand (\(x) x + 1). We believe that it’s better to use these new base tools because they work everywhere: the base pipe doesn’t require that you load magrittr and the new function shorthand works everywhere, not just in purrr functions. Additionally, being able to specify the argument name for the anonymous function can often lead to clearer code.

# Previously we wrote
1:10 %>%
  map(~ rnorm(10, .x)) %>%
  map_dbl(mean)
#>  [1]  0.5586355  1.8213041  2.8764412  4.1521664  5.1160393  6.1271905
#>  [7]  6.9109806  8.2808301  9.2373940 10.6269104

# Now we recommend
1:10 |>
  map(\(mu) rnorm(10, mu)) |>
  map_dbl(mean) 
#>  [1]  0.4638639  2.0966712  3.4441928  3.7806185  5.3373228  6.1854820
#>  [7]  6.5873300  8.3116138  9.4824697 10.4590034

We also recommend using an anonymous function instead of passing additional arguments to map. This avoids a certain class of moderately esoteric argument matching woes and, we believe, is generally easier to read.

mu <- c(1, 10, 100)

# Previously we wrote
mu |> map_dbl(rnorm, n = 1)
#> [1]  0.5706199 11.3604613 99.9291426

# Now we recommend
mu |> map_dbl(\(mu) rnorm(1, mean = mu))
#> [1]   0.7278463   7.5533200 100.0654866

Due to the tidyverse R dependency policy, purrr works in R 3.5, 3.6, 4.0, 4.1, and 4.2, but the base pipe and anonymous function syntax are only available in R 4.0 and later. So the examples are automatically disabled on R 3.5 and 3.6 to allow purrr to continue to pass R CMD check.

Mapping

With that out of the way, we can now talk about the exciting new features in purrr 1.0.0. We’ll start with the map family of functions which have three big new features:

Progress bars.
Better errors.
A new family member: map_vec().

These are described in the following sections.

Progress bars

The map family can now produce a progress bar. This is very useful for long running jobs:

(For interactive use, the progress bar uses some simple heuristics so that it doesn’t show up for very simple jobs.)

In most cases, we expect that .progress = TRUE is enough, but if you’re wrapping map() in another function, you might want to set .progress to a string that identifies the progress bar:

Better errors

If there’s an error in the function you’re mapping, map() and friends now tell you which element caused the problem:

x <- sample(1:500)
x |> map(\(x) if (x == 1) stop("Error!") else 10)
#> Error in `map()`:
#> ℹ In index: 51.
#> Caused by error in `.f()`:
#> ! Error!

We hope that this makes your debugging life just a little bit easier! (Don’t forget about safely() and possibly() if you expect failures and want to either ignore or capture them.)

We have also generally reviewed the error messages throughout purrr in order to make them more actionable. If you hit a confusing error message, please let us know!

New `map_vec()`

We’ve added map_vec() (along with map2_vec(), and pmap_vec()) to handle more types of vectors. map_vec() extends map_lgl(), map_int(), map_dbl(), and map_chr() to arbitrary types of vectors, like dates, factors, and date-times:

1:3 |> map_vec(\(i) factor(letters[i]))
#> [1] a b c
#> Levels: a b c
1:3 |> map_vec(\(i) factor(letters[i], levels = letters[4:1]))
#> [1] a b c
#> Levels: d c b a

1:3 |> map_vec(\(i) as.Date(ISOdate(i + 2022, 10, 5)))
#> [1] "2023-10-05" "2024-10-05" "2025-10-05"
1:3 |> map_vec(\(i) ISOdate(i + 2022, 10, 5))
#> [1] "2023-10-05 12:00:00 GMT" "2024-10-05 12:00:00 GMT"
#> [3] "2025-10-05 12:00:00 GMT"

map_vec() exists somewhat in the middle of base R’s sapply() and vapply(). Unlike sapply() it will always return a simpler vector, erroring if there’s no common type:

list("a", 1) |> map_vec(identity)
#> Error in `map_vec()`:
#> ! Can't combine `<list>[[1]]` <character> and `<list>[[2]]` <double>.

If you want to require a certain type of output, supply .ptype, making map_vec() behave more like vapply(). ptype is short for prototype, and should be a vector that exemplifies the type of output you expect.

x <- list("a", "b") 
x |> map_vec(identity, .ptype = character())
#> [1] "a" "b"

# will error if the result can't be automatically coerced
# to the specified ptype
x |> map_vec(identity, .ptype = integer())
#> Error in `map_vec()`:
#> ! Can't convert `<list>[[1]]` <character> to <integer>.

We don’t expect you to know or memorise the rules that vctrs uses for coercion; our hope is that they’ll become second nature as we steadily ensure that every tidyverse function follows the same rules.

`keep_at()` and `discard_at()`

purrr has gained a new pair of functions, keep_at() and discard_at(), that work like keep() and discard() but operate on names rather than values:

x <- list(a = 1, b = 2, c = 3, D = 4, E = 5)

x |> 
  keep_at(c("a", "b", "c")) |> 
  str()
#> List of 3
#>  $ a: num 1
#>  $ b: num 2
#>  $ c: num 3

x |> 
  discard_at(c("a", "b", "c")) |> 
  str()
#> List of 2
#>  $ D: num 4
#>  $ E: num 5

Alternatively, you can supply a function that is called with the names of the elements and should return a logical vector describing which elements to keep/discard:

is_lower_case <- function(x) x == tolower(x)

x |> keep_at(is_lower_case)
#> $a
#> [1] 1
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3

You can now also pass such a function to all other _at() functions:

x |> 
  modify_at(is_lower_case, \(x) x * 100) |> 
  str()
#> List of 5
#>  $ a: num 100
#>  $ b: num 200
#>  $ c: num 300
#>  $ D: num 4
#>  $ E: num 5

Flattening and simplification

Last, but not least, we’ve reworked the family of functions that flatten and simplify lists. These caused us a lot of confusion internally because folks (and different packages) used the same words to mean different things. Now there are three main functions that share a common prefix that makes it clear that they all operate on lists:

list_flatten() removes a single level of hierarchy from a list; the output is always a list.
list_simplify() reduces a list to a homogeneous vector; the output is always the same length as the input.
list_c(), list_cbind(), and list_rbind() concatenate the elements of a list to produce a vector or data frame. There are no constraints on the output.

These functions have lead us to supersede a number of functions. This means that they are not going away but we no longer recommend them, and they will receive only critical bug fixes.

flatten() has been superseded by list_flatten().
flatten_lgl(), flatten_int(), flatten_dbl(), and flatten_chr() have been superseded by list_c().
flatten_dfr() and flatten_dfc() have been superseded by list_rbind() and list_cbind() respectively. flatten_dfr() had some particularly puzzling edge cases when the inputs would be flattened into columns.
map_dfc() and map_dfr() (and their map2 and pmap variants) have been superseded in favour of using the appropriate map function along with list_rbind() or list_cbind().
simplify(), simplify_all(), and as_vector() have been superseded in favour of list_simplify().

Flattening

list_flatten() removes one layer of hierarchy from a list. In other words, if any of the children of the list are themselves lists, the contents of those lists are inlined into the parent:

x <- list(1, list(2, list(3, 4), 5))
x |> str()
#> List of 2
#>  $ : num 1
#>  $ :List of 3
#>   ..$ : num 2
#>   ..$ :List of 2
#>   .. ..$ : num 3
#>   .. ..$ : num 4
#>   ..$ : num 5
x |> list_flatten() |> str()
#> List of 4
#>  $ : num 1
#>  $ : num 2
#>  $ :List of 2
#>   ..$ : num 3
#>   ..$ : num 4
#>  $ : num 5
x |> list_flatten() |> list_flatten() |> str()
#> List of 5
#>  $ : num 1
#>  $ : num 2
#>  $ : num 3
#>  $ : num 4
#>  $ : num 5

list_flatten() always returns a list; once a list is as flat as it can get (i.e. none of its children contain lists), it leaves the input unchanged.

x |> list_flatten() |> list_flatten() |> list_flatten() |> str()
#> List of 5
#>  $ : num 1
#>  $ : num 2
#>  $ : num 3
#>  $ : num 4
#>  $ : num 5

Simplification

list_simplify() maintains the length of the input, but produces a simpler type:

list(1, 2, 3) |> list_simplify()
#> [1] 1 2 3
list("a", "b", "c") |> list_simplify()
#> [1] "a" "b" "c"

Because the length must stay the same, it will only succeed if every element has length 1:

list_simplify(list(1, 2, 3:4))
#> Error in `list_simplify()`:
#> ! `x[[3]]` must have size 1, not size 2.
list_simplify(list(1, 2, integer()))
#> Error in `list_simplify()`:
#> ! `x[[3]]` must have size 1, not size 0.

Because the result must be a simpler vector, all the components must be compatible:

list_simplify(list(1, 2, "a"))
#> Error in `list_simplify()`:
#> ! Can't combine `<list>[[1]]` <double> and `<list>[[3]]` <character>.

If you need to simplify if it’s possible, but otherwise leave the input unchanged, use strict = FALSE:

list_simplify(list(1, 2, "a"), strict = FALSE)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] "a"

If you want to be specific about the type you want, list_simplify() can take the same prototype argument as map_vec():

list(1, 2, 3) |> list_simplify(ptype = integer())
#> [1] 1 2 3

list(1, 2, 3) |> list_simplify(ptype = factor())
#> Error in `list_simplify()`:
#> ! Can't convert `<list>[[1]]` <double> to <factor<>>.

Concatenation

list_c(), list_cbind(), and list_rbind() concatenate all elements together in a similar way to using do.call(c) or do.call(rbind)¹ . Unlike list_simplify(), this allows the elements to be different lengths:

list(1, 2, 3) |> list_c()
#> [1] 1 2 3
list(1, 2, 3:4, integer()) |> list_c()
#> [1] 1 2 3 4

The downside of this flexibility is that these functions break the connection between the input and the output. This reveals that map_dfr() and map_dfc() don’t really belong to the map family because they don’t maintain a 1-to-1 mapping between input and output: there’s reliable no way to associate a row in the output with an element in an input.

For this reason, map_dfr() and map_dfc() (and the map2 and pmap) variants are superseded and we recommend switching to an explicit call to list_rbind() or list_cbind() instead:

paths |> map_dfr(read_csv, .id = "path")
# now
paths |> 
  map(read_csv) |> 
  list_rbind(names_to = "path")

This new behaviour also affects to accumulate() and accumulate2(), which previously had an idiosyncratic approach to simplification.

`list_assign()`

There’s one other new function that isn’t directly related to flattening and friends, but shares the list_ prefix: list_assign(). list_assign() is similar to list_modify() but it doesn’t work recursively. This is a mildly confusing feature of list_modify() that it’s easy to miss in the documentation.

list(x = 1, y = list(a = 1)) |> 
  list_modify(y = list(b = 1)) |> 
  str()
#> List of 2
#>  $ x: num 1
#>  $ y:List of 2
#>   ..$ a: num 1
#>   ..$ b: num 1

list_assign() doesn’t recurse into sublists making it a bit easier to reason about:

list(x = 1, y = list(a = 1)) |> 
  list_assign(y = list(b = 2)) |> 
  str()
#> List of 2
#>  $ x: num 1
#>  $ y:List of 1
#>   ..$ b: num 2

Acknowledgements

A massive thanks to all 162 contributors who have helped make purrr 1.0.0 happen! @adamroyjones, @afoltzm, @agilebean, @ahjames11, @AHoerner, @alberto-dellera, @alex-gable, @AliciaSchep, @ArtemSokolov, @AshesITR, @asmlgkj, @aubryvetepi, @balwierz, @bastianilso, @batpigandme, @bebersb, @behrman, @benjaminschwetz, @billdenney, @Breza, @brunj7, @BrunoGrandePhD, @CGMossa, @cgoo4, @chsafouane, @chumbleycode, @ColinFay, @CorradoLanera, @CPRyan, @czeildi, @dan-reznik, @DanChaltiel, @datawookie, @dave-lovell, @davidsjoberg, @DavisVaughan, @deann88, @dfalbel, @dhslone, @dlependorf, @dllazarov, @dpprdan, @dracodoc, @echasnovski, @edo91, @edoardo-oliveri-sdg, @erictleung, @eyayaw, @felixhell2004, @florianm, @florisvdh, @flying-sheep, @fpinter, @frankzhang21, @gaborcsardi, @GarrettMooney, @gdurif, @ge-li, @ggrothendieck, @grayskripko, @gregleleu, @gregorp, @hadley, @hendrikvanb, @holgerbrandl, @hriebl, @hsloot, @huftis, @iago-pssjd, @iamnicogomez, @IndrajeetPatil, @irudnyts, @izahn, @jameslairdsmith, @jedwards24, @jemus42, @jennybc, @jhrcook, @jimhester, @jimjam-slam, @jnolis, @joelgombin, @jonathan-g, @jpmarindiaz, @jxu, @jzadra, @karchjd, @karjamatti, @kbzsl, @krlmlr, @lahvak, @lambdamoses, @lasuk, @lionel-, @lorenzwalthert, @LukasWallrich, @LukaszDerylo, @malcolmbarrett, @MarceloRTonon, @mattwarkentin, @maxheld83, @Maximilian-Stefan-Ernst, @mccroweyclinton-EPA, @medewitt, @meowcat, @mgirlich, @mine-cetinkaya-rundel, @mitchelloharawild, @mkoohafkan, @mlane3, @mmuurr, @moodymudskipper, @mpettis, @nealrichardson, @Nelson-Gon, @neuwirthe, @njtierney, @oduilln, @papageorgiou, @pat-s, @paulponcet, @petyaracz, @phargarten2, @philiporlando, @q-w-a, @QuLogic, @ramiromagno, @rcorty, @reisner, @Rekyt, @roboes, @romainfrancois, @rorynolan, @salim-b, @sar8421, @ScoobyQ, @sda030, @sgschreiber, @sheffe, @Shians, @ShixiangWang, @shosaco, @siavash-babaei, @stephenashton-dhsc, @stschiff, @surdina, @tdawry, @thebioengineer, @TimTaylor, @TimTeaFan, @tomjemmett, @torbjorn, @tvatter, @TylerGrantSmith, @vorpalvorpal, @vspinu, @wch, @werkstattcodes, @williamlai2, @yogat3ch, @yutannihilation, and @zeehio.

But if they used the tidyverse coercion rules. ↩︎

Thanks for visiting r-craft.org
This article is originally published at https://www.tidyverse.org/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

purrr 1.0.0

You may also like...

Categories

purrr 1.0.0

Breaking changes

pluck() and zero-length vectors

Tidyverse consistency

Assigning NULL

Core purpose refinements

Documentation

Mapping

Progress bars

Better errors

New map_vec()

keep_at() and discard_at()

Flattening and simplification

Flattening

Simplification

Concatenation

list_assign()

Acknowledgements

You may also like...

Easy quick PCA analysis in R

NIH DS bootcamp: finding data panel

Time Averages of NetCDF files from ECMWF in ArcGIS with R-Bridge

Categories

`pluck()` and zero-length vectors

Assigning `NULL`

New `map_vec()`

`keep_at()` and `discard_at()`

`list_assign()`