R News

tidyr::spread() and dplyr::rename_at() in action

by Econometrics and Free Software · July 27, 2017

This article is originally published at http://www.brodrigues.co/

I was recently confronted to a situation that required going from a long dataset to a wide dataset, but with a small twist: there were two datasets, which I had to merge into one. You might wonder what kinda crappy twist that is, right? Well, let’s take a look at the data:

data1; data2

## # A tibble: 20 x 4
##    country date       variable_1       value
##    <chr>   <chr>      <chr>            <int>
##  1 lu      01/01/2005 maybe               22
##  2 lu      01/07/2005 maybe               13
##  3 lu      01/01/2006 maybe               40
##  4 lu      01/07/2006 maybe               25
##  5 lu      01/01/2005 totally_agree       42
##  6 lu      01/07/2005 totally_agree       17
##  7 lu      01/01/2006 totally_agree       25
##  8 lu      01/07/2006 totally_agree       16
##  9 lu      01/01/2005 totally_disagree    39
## 10 lu      01/07/2005 totally_disagree    17
## 11 lu      01/01/2006 totally_disagree    23
## 12 lu      01/07/2006 totally_disagree    21
## 13 lu      01/01/2005 kinda_disagree      69
## 14 lu      01/07/2005 kinda_disagree      12
## 15 lu      01/01/2006 kinda_disagree      10
## 16 lu      01/07/2006 kinda_disagree       9
## 17 lu      01/01/2005 kinda_agree         38
## 18 lu      01/07/2005 kinda_agree         31
## 19 lu      01/01/2006 kinda_agree         19
## 20 lu      01/07/2006 kinda_agree         12

## # A tibble: 20 x 4
##    country date       variable_2       value
##    <chr>   <chr>      <chr>            <int>
##  1 lu      01/01/2005 kinda_agree         22
##  2 lu      01/07/2005 kinda_agree         13
##  3 lu      01/01/2006 kinda_agree         40
##  4 lu      01/07/2006 kinda_agree         25
##  5 lu      01/01/2005 totally_agree       42
##  6 lu      01/07/2005 totally_agree       17
##  7 lu      01/01/2006 totally_agree       25
##  8 lu      01/07/2006 totally_agree       16
##  9 lu      01/01/2005 totally_disagree    39
## 10 lu      01/07/2005 totally_disagree    17
## 11 lu      01/01/2006 totally_disagree    23
## 12 lu      01/07/2006 totally_disagree    21
## 13 lu      01/01/2005 maybe               69
## 14 lu      01/07/2005 maybe               12
## 15 lu      01/01/2006 maybe               10
## 16 lu      01/07/2006 maybe                9
## 17 lu      01/01/2005 kinda_disagree      38
## 18 lu      01/07/2005 kinda_disagree      31
## 19 lu      01/01/2006 kinda_disagree      19
## 20 lu      01/07/2006 kinda_disagree      12

As explained in Hadley (2014), this is how you should keep your data… But for a particular purpose, I had to transform these datasets. What I was asked to do was to merge these into a single wide data frame. Doing this for one dataset is easy:

data1 %>%
  spread(variable_1, value)

## # A tibble: 4 x 7
##   country date       kinda_agree kinda_disagree maybe totally_agree
##   <chr>   <chr>            <int>          <int> <int>         <int>
## 1 lu      01/01/2005          38             69    22            42
## 2 lu      01/01/2006          19             10    40            25
## 3 lu      01/07/2005          31             12    13            17
## 4 lu      01/07/2006          12              9    25            16
## # ... with 1 more variable: totally_disagree <int>

But because data1 and data2 have the same levels for variable_1 and variable_2, this would not work. So the solution I found online, in this SO thread was to use tidyr::spread() with dplyr::rename_at() like this:

data1 <- data1 %>%
  spread(variable_1, value) %>%
  rename_at(vars(-country, -date), funs(paste0("variable1:", .)))

glimpse(data1)

## Observations: 4
## Variables: 7
## $ country                      <chr> "lu", "lu", "lu", "lu"
## $ date                         <chr> "01/01/2005", "01/01/2006", "01/0...
## $ `variable1:kinda_agree`      <int> 38, 19, 31, 12
## $ `variable1:kinda_disagree`   <int> 69, 10, 12, 9
## $ `variable1:maybe`            <int> 22, 40, 13, 25
## $ `variable1:totally_agree`    <int> 42, 25, 17, 16
## $ `variable1:totally_disagree` <int> 39, 23, 17, 21

data2 <- data2 %>%
  spread(variable_2, value) %>%
  rename_at(vars(-country, -date), funs(paste0("variable2:", .)))

glimpse(data2)

## Observations: 4
## Variables: 7
## $ country                      <chr> "lu", "lu", "lu", "lu"
## $ date                         <chr> "01/01/2005", "01/01/2006", "01/0...
## $ `variable2:kinda_agree`      <int> 22, 40, 13, 25
## $ `variable2:kinda_disagree`   <int> 38, 19, 31, 12
## $ `variable2:maybe`            <int> 69, 10, 12, 9
## $ `variable2:totally_agree`    <int> 42, 25, 17, 16
## $ `variable2:totally_disagree` <int> 39, 23, 17, 21

rename_at() needs variables which you pass to vars(), a helper function to select variables, and a function that will do the renaming, passed to funs(). The function I use is simply paste0(), which pastes a string, for example “variable1:” with the name of the columns, given by the single ‘.’, a dummy argument. Now these datasets can be merged:

data1 %>%
  full_join(data2) %>%
  glimpse()

## Joining, by = c("country", "date")

## Observations: 4
## Variables: 12
## $ country                      <chr> "lu", "lu", "lu", "lu"
## $ date                         <chr> "01/01/2005", "01/01/2006", "01/0...
## $ `variable1:kinda_agree`      <int> 38, 19, 31, 12
## $ `variable1:kinda_disagree`   <int> 69, 10, 12, 9
## $ `variable1:maybe`            <int> 22, 40, 13, 25
## $ `variable1:totally_agree`    <int> 42, 25, 17, 16
## $ `variable1:totally_disagree` <int> 39, 23, 17, 21
## $ `variable2:kinda_agree`      <int> 22, 40, 13, 25
## $ `variable2:kinda_disagree`   <int> 38, 19, 31, 12
## $ `variable2:maybe`            <int> 69, 10, 12, 9
## $ `variable2:totally_agree`    <int> 42, 25, 17, 16
## $ `variable2:totally_disagree` <int> 39, 23, 17, 21

Hope this post helps you understand the difference between long and wide datasets better, as well as dplyr::rename_at()!

Thanks for visiting r-craft.org
This article is originally published at http://www.brodrigues.co/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

tidyr::spread() and dplyr::rename_at() in action

You may also like...

Categories

tidyr::spread() and dplyr::rename_at() in action

You may also like...

Register today for rstudio::conf 2017!

Describing Landscapes by Statistics of Local Pattern Features: Application to Landscape Regionalization, Change, and Search

Reinforcement Learning With TicTacJoe: A Simple Brain Coded Explicitly in R

Categories