R News

tidyr 0.5.0

by RStudio | Open source & professional software for data science teams on RStudio · June 13, 2016

This article is originally published at https://www.rstudio.com/blog/

I’m pleased to announce tidyr 0.5.0. tidyr makes it easy to “tidy” your data, storing it in a consistent form so that it’s easy to manipulate, visualise and model. Tidy data has a simple convention: put variables in the columns and observations in the rows. You can learn more about it in the tidy data vignette. Install it with:

install.packages("tidyr")

This release has three useful new features:

separate_rows() separates values that contain multiple values separated by a delimited into multiple rows. Thanks to Aaron Wolen for the contribution!

df <- data_frame(x = 1:2, y = c("a,b", "d,e,f"))
df %>%
  separate_rows(y, sep = ",")
#> Source: local data frame [5 x 2]
#>
#>       x     y
#>   <int> <chr>
#> 1     1     a
#> 2     1     b
#> 3     2     d
#> 4     2     e
#> 5     2     f

Compare with separate() which separates into (named) columns:

df %>%
  separate(y, c("y1", "y2", "y3"), sep = ",", fill = "right")
#> Source: local data frame [2 x 4]
#>
#>       x    y1    y2    y3
#> * <int> <chr> <chr> <chr>
#> 1     1     a     b  <NA>
#> 2     2     d     e     f

spread() gains a sep argument. Setting this will name columns as “key|sep|value”. This is useful when you’re spreading based on a numeric column:

df <- data_frame(
  x = c(1, 2, 1),
  key = c(1, 1, 2),
  val = c("a", "b", "c")
)
df %>% spread(key, val)
#> Source: local data frame [2 x 3]
#>
#>       x     1     2
#> * <dbl> <chr> <chr>
#> 1     1     a     c
#> 2     2     b  <NA>
df %>% spread(key, val, sep = "_")
#> Source: local data frame [2 x 3]
#>
#>       x key_1 key_2
#> * <dbl> <chr> <chr>
#> 1     1     a     c
#> 2     2     b  <NA>

unnest() gains a .sep argument. This is useful if you have multiple columns of data frames that have the same variable names:

df <- data_frame(
  x = 1:2,
  y1 = list(
    data_frame(y = 1),
    data_frame(y = 2)
  ),
  y2 = list(
    data_frame(y = "a"),
    data_frame(y = "b")
  )
)
df %>% unnest()
#> Source: local data frame [2 x 3]
#>
#>       x     y     y
#>   <int> <dbl> <chr>
#> 1     1     1     a
#> 2     2     2     b
df %>% unnest(.sep = "_")
#> Source: local data frame [2 x 3]
#>
#>       x  y1_y  y2_y
#>   <int> <dbl> <chr>
#> 1     1     1     a
#> 2     2     2     b

It also gains a .id column that makes the names of the list explicit:

df <- data_frame(
  x = 1:2,
  y = list(
    a = 1:3,
    b = 3:1
  )
)
df %>% unnest()
#> Source: local data frame [6 x 2]
#>
#>       x     y
#>   <int> <int>
#> 1     1     1
#> 2     1     2
#> 3     1     3
#> 4     2     3
#> 5     2     2
#> 6     2     1
df %>% unnest(.id = "id")
#> Source: local data frame [6 x 3]
#>
#>       x     y    id
#>   <int> <int> <chr>
#> 1     1     1     a
#> 2     1     2     a
#> 3     1     3     a
#> 4     2     3     b
#> 5     2     2     b
#> 6     2     1     b

tidyr 0.5.0 also includes a bumper crop of bug fixes, including fixes for spread() and gather() in the presence of list-columns. Please see the release notes for a complete list of changes.

Thanks for visiting r-craft.org
This article is originally published at https://www.rstudio.com/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

tidyr 0.5.0

You may also like...

Categories

tidyr 0.5.0

You may also like...

Book review: SQL Server 2017 Machine Learning Services with R

Define a custom print method for exposed C++ classes

Documenting Rcpp functions and classes in R packages

Categories