R News

recipes 0.2.0

by Posts | Tidyverse · February 22, 2022

This article is originally published at https://www.tidyverse.org/blog/

We’re very excited to announce the release of recipes 0.2.0. recipes is a package for preprocessing data before using it in models or visualizations. You can think of it as a mash-up of model.matrix() and dplyr.

You can install it from CRAN with:

install.packages("recipes")

This blog post will describe the highlights of what’s new. You can see a full list of changes in the release notes.

New Steps

step_nnmf_sparse() was added to produce features using non-negative matrix factorization (via the RcppML package). This will supersede the existing step_nnmf() since that step was difficult to support and use. The new step allows for a sparse representation via regularization and, from our initial testing, is much faster than the original NNMF step.

The new step step_dummy_extract() helps create indicator variables from text data, especially those with multiple choice values. For example, if a row of a variable had a value of "red,black,brown", the step can separate these values and make all of the required binary dummy variables.

Here’s a real example from Episode 8 of Sliced where a column of data from Spotify had the artist(s) of a song:

library(recipes)
spotify <- 
  tibble::tribble(
    ~ artists,
    "['Genesis']",
    "['Billie Holiday', 'Teddy Wilson']",
    "['Jimmy Barnes', 'INXS']"
  )
recipe(~ artists, data = spotify) %>% 
  step_dummy_extract(artists, pattern = "(?<=')[^',]+(?=')") %>% 
  prep() %>% 
  bake(new_data = NULL) %>% 
  glimpse()

## Rows: 3
## Columns: 6
## $ artists_Billie.Holiday <dbl> 0, 1, 0
## $ artists_Genesis        <dbl> 1, 0, 0
## $ artists_INXS           <dbl> 0, 0, 1
## $ artists_Jimmy.Barnes   <dbl> 0, 0, 1
## $ artists_Teddy.Wilson   <dbl> 0, 1, 0
## $ artists_other          <dbl> 0, 0, 0

Note that this step produces an “other” column and has arguments similar to step_other() and step_dummy_multi_choice().

step_percentile() is a new step function after it had previously only been an example in the developer documentation. It can determine the empirical distribution of a variable using the training set, then convert any value to the percentile of this distribution.

Finally, a new filtering function (step_filter_missing()) can filter out columns that have too many missing values (for some definition of “too many”).

Other notable new features

step_zv() now has a group argument. This can be helpful for models such as naive Bayes or quadratic discriminant analysis where the predictors must have at least two unique values within each class.

All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect. For example, if a previous step removed all of the columns needed for a later step, the recipe does not fail when it is estimated (with the exception of step_mutate()). The documentation in ?selections has been updated with advice for writing selectors when filtering steps are used.

There are new extract_parameter_set_dials() and extract_parameter_dials() methods to extract parameter sets and single parameters from a recipe. Since this is related to tuning parameters, the tune package should be loaded before they are used.

Breaking changes

Changes in step_ica() and step_kpca*() will now cause recipe objects from previous versions to error when applied to new data. You will need to update these recipes with the current version to be able to use them.

Acknowledgements

We’d like to thank everyone that has contributed since the last release:@agwalker82, @albert-ying, @AshesITR, @ddsjoberg, @DoktorMike, @EmilHvitfeldt, @emmansh, @hermandr, @hfrick, @jacekkotowski, @JensPMB, @jkennel, @juliasilge, @lg1000, @lionel-, @markjrieke, @mattwarkentin, @MichaelChirico, @ninohardt, @SewerynGrodny, @SimonCoulombe, @spsanderson, @tedmoorman, @topepo, @tsengj, @walrossker, @williamshell, and @xiaoxi-david.

Thanks for visiting r-craft.org
This article is originally published at https://www.tidyverse.org/blog/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

recipes 0.2.0

You may also like...

Categories

recipes 0.2.0

New Steps

Other notable new features

Breaking changes

Acknowledgements

You may also like...

Visualizing Arkansas traffic fatalities, part 4

R Weekly 2019-14 Shiny Contest, usethis for reporting

House sales in London SW10 take a few punches

Categories