Web Scraping 101 in Python with Requests & BeautifulSoup

You may also like...

array(6) { ["headers"]=> object(WpOrg\Requests\Utility\CaseInsensitiveDictionary)#5744 (1) { ["data":protected]=> array(13) { ["accept-ranges"]=> string(5) "bytes" ["age"]=> string(5) "74701" ["cache-control"]=> string(32) "public,max-age=0,must-revalidate" ["cache-status"]=> string(19) ""Netlify Edge"; hit" ["content-encoding"]=> string(2) "br" ["content-length"]=> string(7) "1395825" ["content-type"]=> string(15) "application/xml" ["date"]=> string(29) "Sat, 19 Apr 2025 17:47:12 GMT" ["etag"]=> string(41) ""6dac541fceea0388c65f4033d61747b7-ssl-df"" ["server"]=> string(7) "Netlify" ["strict-transport-security"]=> string(16) "max-age=31536000" ["vary"]=> string(15) "Accept-Encoding" ["x-nf-request-id"]=> string(26) "01JS7JJQ74Y88BHHVE5SARHSM0" } } ["body"]=> string(10903639) " Posts | Tidyverse https://www.tidyverse.org/blog/ Posts Hugo -- gohugo.ioen-usFri, 04 Apr 2025 00:00:00 +0000 Learning the tidyverse with the help of AI tools https://www.tidyverse.org/blog/2025/04/learn-tidyverse-ai/ Fri, 04 Apr 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/04/learn-tidyverse-ai/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>As an educator who teaches data science with R, I have LOTS of opinions about using artificial intelligence (AI) tools when learning R. But I will keep this post to the use of generative AI tools, like ChatGPT, in learning R, and specifically learning to do data science with R and the tidyverse.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p> <p>I&rsquo;ll first walk you through three case studies that demonstrate what asking for help from a generative AI tool looks like. Then, I&rsquo;ll wrap up the post with some tips and good practices for getting the most out of help provided by these tools in the context of writing tidyverse code.</p> <p>Before we get started, though, it&rsquo;s important to note that knowledge bases of Large Language Models (LLMs) that underlie popular generative AI tools are private and lack transparency. There are important societal concerns about the fairness of equitable access to these tools. It&rsquo;s unclear how developers or users of these models can be held accountable. Additionally, there is an environmental impact to using these tools.</p> <h2 id="case-study-1-reshaping-and-plotting">Case study 1: Reshaping and plotting <a href="#case-study-1-reshaping-and-plotting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first case study features an example on reshaping and plotting the <code>billboard</code> data frame and it&rsquo;s directly out of <a href="https://r4ds.hadley.nz/" target="_blank" rel="noopener">R for Data Science</a>. In fact, it&rsquo;s <a href="https://r4ds.hadley.nz/data-tidy.html#sec-billboard" target="_blank" rel="noopener">worked out in the book</a>. Here is the initial prompt:</p> <blockquote> <p>Use the billboard dataset in the tidyr package to create a visualization of rank vs. week number for each song in the dataset.</p> </blockquote> <p>And the following is what <a href="https://chatgpt.com/" target="_blank" rel="noopener">ChatGPT</a> by OpenAI suggests as the first answer:</p> <blockquote> <p>The <code>billboard</code> dataset from the tidyr package contains weekly rankings of songs on the Billboard charts. Here&rsquo;s how you can visualize the rank vs. week number for each song in the dataset using R:</p> </blockquote> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Load required libraries</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span></span> <span><span class='c'># Load the billboard dataset</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"billboard"</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Prepare the data for visualization</span></span> <span><span class='nv'>billboard_long</span> <span class='o'>&lt;-</span> <span class='nv'>billboard</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span></span> <span> cols <span class='o'>=</span> <span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"wk"</span><span class='o'>)</span>,</span> <span> names_to <span class='o'>=</span> <span class='s'>"week"</span>,</span> <span> values_to <span class='o'>=</span> <span class='s'>"rank"</span>,</span> <span> names_prefix <span class='o'>=</span> <span class='s'>"wk"</span>,</span> <span> values_drop_na <span class='o'>=</span> <span class='kc'>TRUE</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='c'># Convert week to a numeric value</span></span> <span><span class='nv'>billboard_long</span><span class='o'>$</span><span class='nv'>week</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nv'>billboard_long</span><span class='o'>$</span><span class='nv'>week</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Create the visualization</span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>billboard_long</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>week</span>, y <span class='o'>=</span> <span class='nv'>rank</span>, group <span class='o'>=</span> <span class='nv'>track</span>, color <span class='o'>=</span> <span class='nv'>track</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span>show.legend <span class='o'>=</span> <span class='kc'>FALSE</span>, alpha <span class='o'>=</span> <span class='m'>0.6</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_reverse</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='c'># Reverse the y-axis since rank 1 is the top position</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span></span> <span> title <span class='o'>=</span> <span class='s'>"Song Rank vs. Week Number"</span>,</span> <span> x <span class='o'>=</span> <span class='s'>"Week Number"</span>,</span> <span> y <span class='o'>=</span> <span class='s'>"Rank"</span>,</span> <span> caption <span class='o'>=</span> <span class='s'>"Data: Billboard (tidyr package)"</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggtheme.html'>theme_minimal</a></span><span class='o'>(</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-1-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>ChatGPT also provides a neat explanation of the code:</p> <blockquote> <p><strong>Explanation</strong></p> <ol> <li>Pivoting the Data: The <code>billboard</code> dataset is in wide format, with week columns (<code>wk1</code>, <code>wk2</code>, etc.). We use <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> to reshape it into a long format with week and rank columns.</li> <li>Rank vs. Week Plot: Each song&rsquo;s weekly rankings are connected with lines ( <a href="https://ggplot2.tidyverse.org/reference/geom_path.html" target="_blank" rel="noopener"><code>geom_line()</code></a>), and we reverse the y-axis since rank 1 is the best.</li> <li>Grouping by Track: The group aesthetic ensures lines are drawn separately for each song.</li> </ol> <p><strong>Output</strong></p> <p>This will create a line plot where each line represents a song&rsquo;s rank progression over weeks. The y-axis is reversed to reflect higher rankings at the top.</p> </blockquote> <p>There are a few things about this answer that are promising:</p> <ul> <li>The response is a tidyverse approach.</li> <li>It uses <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> as opposed to the superseded alternative <a href="https://tidyr.tidyverse.org/reference/gather.html" target="_blank" rel="noopener"><code>gather()</code></a>.</li> <li>The plot features a reversed y-axis, <a href="https://r4ds.hadley.nz/data-tidy.html#fig-billboard-ranks" target="_blank" rel="noopener">just like in the book</a></li> </ul> <p>However, the resulting plot doesn&rsquo;t show exactly what the prompt asked for &ndash; the lines are colored by <code>track</code>, which produces a busy plot that is difficult to interpret. While this is not ideal, it&rsquo;s something a new learner can easily spot and ask ChatGPT for a follow up correcting it. For example, a subsequent prompt like the following fixes this issue:</p> <blockquote> <p>Can you redraw the plot without coloring each line by track?</p> </blockquote> <p>You can see this in action in <a href="https://chatgpt.com/share/678f1dee-4af0-8000-befe-6df0a8d70fba" target="_blank" rel="noopener">this saved chat</a>, in the second prompt. Here is the updated code suggested by ChatGPT:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Create the visualization without coloring by track</span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>billboard_long</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>week</span>, y <span class='o'>=</span> <span class='nv'>rank</span>, group <span class='o'>=</span> <span class='nv'>track</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span>alpha <span class='o'>=</span> <span class='m'>0.3</span>, color <span class='o'>=</span> <span class='s'>"blue"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_reverse</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='c'># Reverse the y-axis since rank 1 is the top position</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span></span> <span> title <span class='o'>=</span> <span class='s'>"Song Rank vs. Week Number"</span>,</span> <span> x <span class='o'>=</span> <span class='s'>"Week Number"</span>,</span> <span> y <span class='o'>=</span> <span class='s'>"Rank"</span>,</span> <span> caption <span class='o'>=</span> <span class='s'>"Data: Billboard (tidyr package)"</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggtheme.html'>theme_minimal</a></span><span class='o'>(</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Note, however, that the only change wasn&rsquo;t omitting the <code>color = track</code> aesthetic mapping. The <code>alpha</code> level is also changed (from 0.6 to 0.3) without a justification for that change and the lines are colored <code>&quot;blue&quot;</code>. None of these are <em>bad</em> or <em>wrong</em> choices, but they can be confusing for new learners. Similarly, using <a href="https://ggplot2.tidyverse.org/reference/ggtheme.html" target="_blank" rel="noopener"><code>theme_minimal()</code></a> is not a bad or wrong choice either<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>, but it&rsquo;s not <em>necessary</em>, but this might not be obvious to a new learner.</p> <p>Furthermore, while ChatGPT &ldquo;solves&rdquo; the problem, a thorough code review reveals a number of not-so-great things about the answer that can be confusing for new learners or promote poor coding practices:</p> <ul> <li>The code loads packages that are not necessary: tidyr and ggplot2 packages are sufficient for this code, we don&rsquo;t also need dplyr. Additionally, learners coming from R for Data Science likely expect <a href="https://tidyverse.tidyverse.org" target="_blank" rel="noopener"><code>library(tidyverse)</code></a> in analysis code, instead of loading the packages individualy.</li> <li>There is no need to load the <code>billboard</code> dataset, it&rsquo;s available to use once the tidyr package is loaded. Additionally, quotes are not needed, <code>data(billboard)</code> also works.</li> <li>The code mixes up tidyverse and base R styles: <ul> <li>Changing the type of <code>week</code> to numeric can be done in a <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> statement with the tidyverse, which would then warrant loading the dplyr package.</li> <li>This can also be done within <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> with the <code>names_transform</code> argument.</li> </ul> </li> </ul> <p>All of these are addressable with further prompts, as I&rsquo;ve done at <a href="https://chatgpt.com/share/678f1dee-4af0-8000-befe-6df0a8d70fba" target="_blank" rel="noopener">the saved chat</a>, in the last two prompts. But doing so requires being able to identify these issues and explicitly asking for corrections. In practice, I wouldn&rsquo;t have asked ChatGPT to correct everything for me, I would have stopped after the first suggestion, which was a pretty good starting point, and made the improvements myself. However, a new learner might assume (and based on my experience seeing lots of new learner code, <em>does</em> assume) the first answer is the <em>right</em> and <em>good</em> or <em>best</em> answer since (1) it looks reasonable and (2) it works, sort of.</p> <p>Furthermore, requesting improvements in subsequent calls can result in surprising changes that the user hasn&rsquo;t asked for. We saw an example of this above, in updating the alpha level. Similarly, in <a href="https://chatgpt.com/share/678f1dee-4af0-8000-befe-6df0a8d70fba" target="_blank" rel="noopener">the saved chat</a> you can see that asking ChatGPT to not load the packages individually but to use <a href="https://tidyverse.tidyverse.org" target="_blank" rel="noopener"><code>library(tidyverse)</code></a> instead results in this change as well as not loading the data with a <a href="https://rdrr.io/r/utils/data.html" target="_blank" rel="noopener"><code>data()</code></a> call and adding a data transformation step with <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> to convert <code>week</code> to numeric:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Load the tidyverse package</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span></span> <span><span class='c'># Load the billboard dataset and prepare the data</span></span> <span><span class='nv'>billboard_long</span> <span class='o'>&lt;-</span> <span class='nv'>billboard</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span></span> <span> cols <span class='o'>=</span> <span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"wk"</span><span class='o'>)</span>,</span> <span> names_to <span class='o'>=</span> <span class='s'>"week"</span>,</span> <span> values_to <span class='o'>=</span> <span class='s'>"rank"</span>,</span> <span> names_prefix <span class='o'>=</span> <span class='s'>"wk"</span>,</span> <span> values_drop_na <span class='o'>=</span> <span class='kc'>TRUE</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>week <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nv'>week</span><span class='o'>)</span><span class='o'>)</span> <span class='c'># Convert week to numeric</span></span></code></pre> </div> <p>Both of these are welcome changes, but it can be surprising to a new learner why they&rsquo;re combined with updating the <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library()</code></a> call. This is happening because with each call ChatGPT is &ldquo;trying again&rdquo; &ndash; it&rsquo;s not just editing the previous answer but it&rsquo;s regenerating an answer with additional context.</p> <p>I should also note that ChatGPT generally gives better responses when asking about sample datasets because the model has probably been exposed to them somewhere in the training data. Next, let&rsquo;s try a non-sample dataset.</p> <h2 id="case-study-2-data-cleaning">Case study 2: Data cleaning <a href="#case-study-2-data-cleaning"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The next example is a lot more vague as it&rsquo;s a task that, if someone asked me for help, I would reply with &ldquo;Can you share the data with me?&quot;, but ChatGPT happily generates an answer. That&rsquo;s probably one of the reasons why people love these tools, they never respond to a question with another question, at least not in my experience.</p> <p>Below is the prompt for this case study. I&rsquo;ll provide it with no background first so you can read it as the AI tool would &ldquo;read&rdquo; it without any background.</p> <blockquote> <p>Another variable that needs some cleaning up is <code>membership_status</code>. Currently it should either be <code>NA</code> or contain some text that says &ldquo;Select &hellip; to register for this group&rdquo;. Recode this variable to say <code>&quot;Closed&quot;</code> if the current value is <code>NA</code> or <code>&quot;Open&quot;</code> otherwise. Save the resulting dataset with this new variables as <code>clubs</code>, i.e., overwrite the original data frame. Then, display the first 10 rows of the dataset, moving <code>membership_status</code> to the beginning of the dataset to make sure it appears in the output in your rendered document.</p> </blockquote> <p>This time, let&rsquo;s try <a href="https://claude.ai/" target="_blank" rel="noopener">Claude</a> by Anthropic. Below is the first response:</p> <blockquote> <p>I&rsquo;ll help you clean up the membership_status variable. Let&rsquo;s break this down into steps:</p> </blockquote> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Recode membership_status to "Closed" if NA, "Open" otherwise</span></span> <span><span class='nv'>clubs</span><span class='o'>$</span><span class='nv'>membership_status</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/ifelse.html'>ifelse</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>clubs</span><span class='o'>$</span><span class='nv'>membership_status</span><span class='o'>)</span>, <span class='s'>"Closed"</span>, <span class='s'>"Open"</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Reorder columns to put membership_status first</span></span> <span><span class='nv'>clubs</span> <span class='o'>&lt;-</span> <span class='nv'>clubs</span><span class='o'>[</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"membership_status"</span>, <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>setdiff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>clubs</span><span class='o'>)</span>, <span class='s'>"membership_status"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>]</span></span> <span></span> <span><span class='c'># Display first 10 rows</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='nv'>clubs</span>, <span class='m'>10</span><span class='o'>)</span></span></code></pre> </div> <blockquote> <p>This code will:</p> <ul> <li>Use <a href="https://rdrr.io/r/base/ifelse.html" target="_blank" rel="noopener"><code>ifelse()</code></a> to recode the variable based on whether it&rsquo;s NA</li> <li>Reorder the columns to put membership_status first</li> <li>Show the first 10 rows of the modified dataset</li> </ul> <p>Would you like me to explain any part of this solution in more detail?</p> </blockquote> <p>I like the offer to explain more at the end! However, the suggested code does not use the tidyverse. In fact, it&rsquo;s perhaps surprising that it uses R, this could very well have been a task for Python or anoher language commonly used for data science problems.</p> <p>Let&rsquo;s try again:</p> <blockquote> <p>Can you suggest an answer that uses the tidyverse instead of base R?</p> </blockquote> <p>And here is the result:</p> <blockquote> <p>Here&rsquo;s how to accomplish the same tasks using tidyverse functions:</p> </blockquote> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>clubs</span> <span class='o'>&lt;-</span> <span class='nv'>clubs</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>membership_status <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/if_else.html'>if_else</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>membership_status</span><span class='o'>)</span>, </span> <span> <span class='s'>"Closed"</span>, </span> <span> <span class='s'>"Open"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/relocate.html'>relocate</a></span><span class='o'>(</span><span class='nv'>membership_status</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>clubs</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span></span></code></pre> </div> <blockquote> <p>This tidyverse solution:</p> <ul> <li>Uses mutate() with <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a> to recode the variable</li> <li>Uses relocate() to move membership_status to the front</li> <li>Shows the first 10 rows using head()</li> </ul> <p>The tidyverse version is more readable and follows the principle of chaining operations with the pipe operator (<code>%&gt;%</code>). <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a> is the tidyverse equivalent of base R&rsquo;s <a href="https://rdrr.io/r/base/ifelse.html" target="_blank" rel="noopener"><code>ifelse()</code></a>, with stricter type checking.</p> <p>Would you like me to explain any of these functions in more detail?</p> </blockquote> <p>We&rsquo;re getting there! For new learners of the tidyverse, there are still a few aspects of this code that might be misleading:</p> <ul> <li>It doesn&rsquo;t use the base pipe operator (<code>|&gt;</code>), which is the pipe operator used in R for Data Science.</li> <li>Line breaks and indentation does not follow <a href="https://style.tidyverse.org/pipes.html#long-lines" target="_blank" rel="noopener">tidyverse style guidelines</a>, which suggest that if the arguments to a function don&rsquo;t all fit on one line, each argument should go on its own line.</li> <li>If <code>clubs</code> is a tibble, printing it will only display the first 10 rows anyway, so there&rsquo;s no need for the <a href="https://rdrr.io/r/utils/head.html" target="_blank" rel="noopener"><code>head()</code></a> call.</li> </ul> <p>Once again, it&rsquo;s possible to ask the tool to make these updates. At the time of developing this example, Claude didn&rsquo;t provide a sharing link for the conversation (it does now!), but I&rsquo;ve saved it at <a href="https://gist.github.com/mine-cetinkaya-rundel/c74dc7c5f2a27a846574edd063988431">https://gist.github.com/mine-cetinkaya-rundel/c74dc7c5f2a27a846574edd063988431</a>. Even with vague prompts like &ldquo;If clubs is a tibble, it&rsquo;ll only print out the first 10 rows anyway,&rdquo; it does exactly what we would want it to do, resulting in the following:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>clubs</span> <span class='o'>&lt;-</span> <span class='nv'>clubs</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>membership_status <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/if_else.html'>if_else</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>membership_status</span><span class='o'>)</span>, <span class='s'>"Closed"</span>, <span class='s'>"Open"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/relocate.html'>relocate</a></span><span class='o'>(</span><span class='nv'>membership_status</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>clubs</span></span></code></pre> </div> <p>The operative word here being &ldquo;want&rdquo;. Once again, someone with enough experience with R and the tidyverse would not explicitly ask for these improvements; they would just make them themselves and carry on. But someone who is just learning the tidyverse would likely stop much earlier. In my experience, many new learners stop after the very first prompt, which looks unfamiliar, and hence potentially intimidating. For learners of the tidyverse, the simple &ldquo;Can you do this with the tidyverse?&rdquo; can be a very useful prompt improvement that gets them majority of the way there.</p> <h2 id="case-study-3-web-scraping">Case study 3: Web scraping <a href="#case-study-3-web-scraping"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For the third and last example we&rsquo;ll try scraping some data from the web, specifically some <a href="https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192" target="_blank" rel="noopener">structured North Carolina weather data</a>. For this example, I&rsquo;ve used <a href="https://www.perplexity.ai/" target="_blank" rel="noopener">perplexity.ai</a>.</p> <p>My first prompt was:</p> <blockquote> <p>Write code for scraping data from <a href="https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192">https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192</a>.</p> </blockquote> <p>This resulted in an answer using Python, BeautifulSoup, and pandas. So I followed up with:</p> <blockquote> <p>Use R instead.</p> </blockquote> <p>This resulted in the following.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Load required libraries</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rvest.tidyverse.org/'>rvest</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Attaching package: 'rvest'</span></span> <span></span><span><span class='c'>#&gt; The following object is masked from 'package:readr':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; guess_encoding</span></span> <span></span><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span></span> <span><span class='c'># Define the URL for Durham, North Carolina climate data</span></span> <span><span class='nv'>url</span> <span class='o'>&lt;-</span> <span class='s'>"https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192"</span></span> <span></span> <span><span class='c'># Read the HTML content from the website</span></span> <span><span class='nv'>webpage</span> <span class='o'>&lt;-</span> <span class='nf'><a href='http://xml2.r-lib.org/reference/read_xml.html'>read_html</a></span><span class='o'>(</span><span class='nv'>url</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Extract the table containing climate data</span></span> <span><span class='nv'>climate_table</span> <span class='o'>&lt;-</span> <span class='nv'>webpage</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/rename.html'>html_nodes</a></span><span class='o'>(</span><span class='s'>"table"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nv'>.</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='c'># Assuming the first table contains the relevant data</span></span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/html_table.html'>html_table</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Convert the table to a data frame</span></span> <span><span class='nv'>climate_data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/as.data.frame.html'>as.data.frame</a></span><span class='o'>(</span><span class='nv'>climate_table</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Display the scraped data</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='nv'>climate_data</span><span class='o'>)</span></span> <span><span class='c'>#&gt; JanJa FebFe MarMa AprAp MayMa</span></span> <span><span class='c'>#&gt; 1 Average high in ºF Av. high Hi 49.00 53.0 62.00 71.00 79.00</span></span> <span><span class='c'>#&gt; 2 Average low in ºF Av. low Lo 28.00 29.0 37.00 46.00 56.00</span></span> <span><span class='c'>#&gt; 3 Av. precipitation in inch Av. precip. Pre. 4.45 3.7 4.69 3.43 4.61</span></span> <span><span class='c'>#&gt; JunJu</span></span> <span><span class='c'>#&gt; 1 85.00</span></span> <span><span class='c'>#&gt; 2 65.00</span></span> <span><span class='c'>#&gt; 3 4.02</span></span> <span></span></code></pre> </div> <p>Additionally, the suggestion included optional code for writing the data to CSV even though it wasn&rsquo;t part of the prompt, though also not bad practice either when scraping data from the web.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Optional: Save the data to a CSV file</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/write.table.html'>write.csv</a></span><span class='o'>(</span><span class='nv'>climate_data</span>, <span class='s'>"durham_climate_data.csv"</span>, row.names <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span></code></pre> </div> <p>First, this code doesn&rsquo;t give us what we want:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>climate_data</span></span> <span><span class='c'>#&gt; JanJa FebFe MarMa AprAp MayMa</span></span> <span><span class='c'>#&gt; 1 Average high in ºF Av. high Hi 49.00 53.0 62.00 71.00 79.00</span></span> <span><span class='c'>#&gt; 2 Average low in ºF Av. low Lo 28.00 29.0 37.00 46.00 56.00</span></span> <span><span class='c'>#&gt; 3 Av. precipitation in inch Av. precip. Pre. 4.45 3.7 4.69 3.43 4.61</span></span> <span><span class='c'>#&gt; JunJu</span></span> <span><span class='c'>#&gt; 1 85.00</span></span> <span><span class='c'>#&gt; 2 65.00</span></span> <span><span class='c'>#&gt; 3 4.02</span></span> <span></span></code></pre> </div> <p>We only have the first six months (the first table) and the formatting is not great, especially for the column names. Additionally, this doesn&rsquo;t look like a tidy data frame &ndash; a better organization would be months in rows and temperatures in columns.</p> <div class="highlight"> </div> <p>Let&rsquo;s follow up with a better articulated prompt:</p> <blockquote> <p>Update the code to scrape data for all months from both of the tables on the website and organizing the data with months as rows and temperature types as columns.</p> </blockquote> <p>Here is the updated suggestion:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rvest.tidyverse.org/'>rvest</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>url</span> <span class='o'>&lt;-</span> <span class='s'>"https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192"</span></span> <span></span> <span><span class='c'># Read HTML content</span></span> <span><span class='nv'>webpage</span> <span class='o'>&lt;-</span> <span class='nf'><a href='http://xml2.r-lib.org/reference/read_xml.html'>read_html</a></span><span class='o'>(</span><span class='nv'>url</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Extract all tables</span></span> <span><span class='nv'>climate_tables</span> <span class='o'>&lt;-</span> <span class='nv'>webpage</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/rename.html'>html_nodes</a></span><span class='o'>(</span><span class='s'>"table"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/html_table.html'>html_table</a></span><span class='o'>(</span>header <span class='o'>=</span> <span class='kc'>TRUE</span>, fill <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Check if we have at least 2 tables</span></span> <span><span class='kr'>if</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='o'>(</span><span class='nv'>climate_tables</span><span class='o'>)</span> <span class='o'>&gt;=</span> <span class='m'>2</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='c'># Process first table (Average High Temperatures)</span></span> <span> <span class='nv'>high_temp</span> <span class='o'>&lt;-</span> <span class='nv'>climate_tables</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, High <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>High <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='o'>(</span><span class='s'>"[^0-9.]"</span>, <span class='s'>""</span>, <span class='nv'>High</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># Process second table (Average Low Temperatures)</span></span> <span> <span class='nv'>low_temp</span> <span class='o'>&lt;-</span> <span class='nv'>climate_tables</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, Low <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>Low <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='o'>(</span><span class='s'>"[^0-9.]"</span>, <span class='s'>""</span>, <span class='nv'>Low</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># Combine and organize data</span></span> <span> <span class='nv'>combined_data</span> <span class='o'>&lt;-</span> <span class='nv'>high_temp</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>full_join</a></span><span class='o'>(</span><span class='nv'>low_temp</span>, by <span class='o'>=</span> <span class='s'>"Month"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>Month</span>, levels <span class='o'>=</span> <span class='nv'>month.name</span><span class='o'>)</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># Display cleaned data</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='nv'>combined_data</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># Save to CSV</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/write.table.html'>write.csv</a></span><span class='o'>(</span><span class='nv'>combined_data</span>, <span class='s'>"durham_temperatures.csv"</span>, row.names <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span> <span class='kr'>else</span> <span class='o'>&#123;</span></span> <span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"Required tables not found - website structure may have changed"</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; Month High Low</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Average high in ºF Av. high Hi 49 89 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Average low in ºF Av. low Lo 28 70 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Av. precipitation in inch Av. precip. Pre. 4.45 3.94</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'>#&gt; [1] TRUE</span></span> <span></span></code></pre> </div> <p>There is a lot going on here, especially for a new learner, and the result is ultimately not what we want:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>combined_data</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; Month High Low</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Average high in ºF Av. high Hi 49 89 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Average low in ºF Av. low Lo 28 70 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Av. precipitation in inch Av. precip. Pre. 4.45 3.94</span></span> <span></span></code></pre> </div> <div class="highlight"> </div> <p>Let&rsquo;s try one more time, following up with this prompt:</p> <blockquote> <p>This is not what I wanted. First scrape the data from the two tables for months January to June and July to December. Then put them together. Then reshape the data to have months as rows and temperature types as columns.</p> </blockquote> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rvest.tidyverse.org/'>rvest</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>url</span> <span class='o'>&lt;-</span> <span class='s'>"https://www.usclimatedata.com/climate/durham/north-carolina/united-states/usnc0192"</span></span> <span><span class='nv'>webpage</span> <span class='o'>&lt;-</span> <span class='nf'><a href='http://xml2.r-lib.org/reference/read_xml.html'>read_html</a></span><span class='o'>(</span><span class='nv'>url</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Extract all climate tables (4 tables total)</span></span> <span><span class='nv'>all_tables</span> <span class='o'>&lt;-</span> <span class='nv'>webpage</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/rename.html'>html_nodes</a></span><span class='o'>(</span><span class='s'>"table.climate-table"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://rvest.tidyverse.org/reference/html_table.html'>html_table</a></span><span class='o'>(</span>header <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Process temperature data from first two tables (Jan-Jun and Jul-Dec)</span></span> <span><span class='nv'>high_temp</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/bind_rows.html'>bind_rows</a></span><span class='o'>(</span></span> <span> <span class='nv'>all_tables</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, High <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span>,</span> <span> <span class='nv'>all_tables</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, High <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>High <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='o'>(</span><span class='s'>"°F"</span>, <span class='s'>""</span>, <span class='nv'>High</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error in all_tables[[1]]: subscript out of bounds</span></span> <span></span><span></span> <span><span class='nv'>low_temp</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/bind_rows.html'>bind_rows</a></span><span class='o'>(</span></span> <span> <span class='nv'>all_tables</span><span class='o'>[[</span><span class='m'>3</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, Low <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span>,</span> <span> <span class='nv'>all_tables</span><span class='o'>[[</span><span class='m'>4</span><span class='o'>]</span><span class='o'>]</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='m'>1</span>, Low <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>Low <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>as.numeric</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='o'>(</span><span class='s'>"°F"</span>, <span class='s'>""</span>, <span class='nv'>Low</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error in all_tables[[3]]: subscript out of bounds</span></span> <span></span><span></span> <span><span class='c'># Combine and format final dataset</span></span> <span><span class='nv'>climate_data</span> <span class='o'>&lt;-</span> <span class='nv'>high_temp</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>low_temp</span>, by <span class='o'>=</span> <span class='s'>"Month"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>Month <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>Month</span>, levels <span class='o'>=</span> <span class='nv'>month.name</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>Month</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>Month</span>, <span class='nv'>High</span>, <span class='nv'>Low</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Display formatted results</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='nv'>climate_data</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; Month High Low</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #BB0000;'>NA</span> 49 89 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #BB0000;'>NA</span> 28 70 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #BB0000;'>NA</span> 4.45 3.94</span></span> <span></span></code></pre> </div> <p>Unfortunately this gives an error. However, the presentation in the answer from Perplexity makes it seem like the data were scraped successfully since a table with rows as months and columns as temperature types is included in the explanation:</p> <pre><code> Month High Low 1 January 50 30 2 February 54 32 3 March 63 39 4 April 72 47 5 May 79 56 6 June 85 64 7 July 89 69 8 August 87 68 9 September 81 60 10 October 72 49 11 November 63 39 12 December 54 32 </code></pre> <p>I&rsquo;m not sure how these data were extracted, but we know it&rsquo;s not with the R code provided above. The values are also not correct (e.g., high and low in January should be 49 and 28, instead), so it&rsquo;s completely unclear where they&rsquo;re coming from. You can follow along with this thread at <a href="https://www.perplexity.ai/search/write-code-for-scraping-data-f-6kRnwLDTTpe8vItl08Bo3g">https://www.perplexity.ai/search/write-code-for-scraping-data-f-6kRnwLDTTpe8vItl08Bo3g</a>. I tried a few more prompts and finally gave up. While the other two tasks were much more straightforward, the web scraping task seems to be more difficult for this tool. I should note that I used different services for each task, and the lack of success in this last one might be due to that as well.</p> <p>Ultimately, though, as the complexity of the task increases, it (understandably) gets more difficult to get to straightforward and new-learner-friendly answers with simple prompts.</p> <h2 id="tips-and-good-practices">Tips and good practices <a href="#tips-and-good-practices"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I&rsquo;ll wrap up this post with some tips and good practices for using AI tools for (tidyverse) code generation. But first, a disclaimer &ndash; this landscape is changing super quickly. Today&rsquo;s good practices might not be the best approaches for tomorrow. However, the following have held true over the last year so there&rsquo;s a good chance they will remain relevant for some time into the future.</p> <ol> <li> <p><strong>Provide context and engineer prompts:</strong> This might be obvious, but it should be stated. Providing context, even something as simple as &ldquo;use R&rdquo; or &ldquo;use tidyverse&rdquo; can go a long way in getting a semi-successful first suggestion. Then, continue engineering the prompt until you achieve the results you need, being more articulate about what you want at each step. This is easier said than done, though, for new learners. If you don&rsquo;t know what the right answer should look like, it&rsquo;s much harder to be articulate in your prompt to get to that answer. On the other hand, if you do know what the right answer should look like, you might be more likely to just write the code yourself, instead of coaching the AI tool to get there. Another potentially helpful tip is to end your initial prompt with something like &ldquo;Ask me any clarifying questions before you begin&rdquo;. This way you don&rsquo;t have to think about all the necessary context at once, you can get the tool to ask you for some of the details.</p> </li> <li> <p><strong>Check for errors:</strong> This also seems obvious &ndash; you should run the code the tool suggests and check for errors. If the code gives an error, this is easy to catch and potentially easy to address. However, sometimes the code suggests arguments that don&rsquo;t exist that R might silently ignore. These might be unneeded arguments or a needed argument but not used properly due to how it&rsquo;s called or the value it&rsquo;s set to. Such errors are more difficult to identify, particularly in functions you might not be familiar with.</p> </li> <li> <p><strong>Run the code it gives you, line-by-line, even if the code is in a pipeline:</strong> Tidyverse data transformation pipelines and ggplot layers are easy to run at once, with the code doing many things with one execution prompt, compared to Base R code where you execute each line of code separately. The scaffolded nature of these pipelines are very nice for keeping all steps associated with a task together and not generating unnecessary interim objects along the way. However, it requires self-discipline to inspect the code line-by-line as opposed to just inspecting the final output. For example, I regularly encounter unnecessary <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>/ <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>ungroup()</code></a>s or <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a>s steps injected into pipelines. Identifying these requires running the pipeline code line-by-line, and then you can remove or modify them to simplify your answer. My recommendation would be to approach the working with AI tools for code generation with an &ldquo;I&rsquo;m trying to learn how to do this&rdquo; attitude. It&rsquo;s then natural to investigate and interact with each step of the answer. If you approach it with a &ldquo;Solve this for me&rdquo; attitude, it&rsquo;s a lot harder to be critical of seemingly functioning and seemingly good enough code.</p> </li> <li> <p><strong>Improve code smell:</strong> While I don&rsquo;t have empirical evidence for this, I believe for humans, taste for good code develops faster than ability. For LLMs, this is the opposite. These tools will happily barf out code that runs without regard to cohesive syntax, avoiding redundancies, etc. Therefore, it&rsquo;s essential to &ldquo;clean up&rdquo; the suggested code to improve its &ldquo;code smell&rdquo;. Below are some steps I regularly use:</p> <ul> <li>Remove redundant library calls.</li> <li>Use <code>pkg::function()</code> syntax only as needed and consistently.</li> <li>Avoid mixing and matching base R and tidyverse syntax (e.g., in one step finding mean in a <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarize()</code></a> call and in another step as mean of a vector, <code>mean(df$var)</code>.</li> <li>Remove unnecessary <a href="https://rdrr.io/r/base/print.html" target="_blank" rel="noopener"><code>print()</code></a> statements.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></li> <li>Consider whether code comments address the &ldquo;why&rdquo; or the &ldquo;what.&rdquo; If comments describe relatively self-documenting code, consider removing them.</li> </ul> </li> <li> <p><strong>Stuck? Start a new chat:</strong> Each new prompt in a chat/thread is evaluated within the context of previous prompts in that thread. If you&rsquo;re stuck and not getting to a good answer after modifying your prompt a few times, start fresh with a new chat/thread instead.</p> </li> <li> <p><strong>Use code completion tools sparingly if you&rsquo;re a new user:</strong> Code completion tools, like <a href="https://github.com/features/copilot" target="_blank" rel="noopener">GitHub Copilot</a>, can be huge productivity boosters. But, especially for new learners, they can also be huge distractions as they tend to take action before the user is able to complete a thought in their head. My recommendation for new learners would be to avoid these tools altogether until they get a little faster at going from idea to code by themselves, or at a minimum until they feel like they can consistently write high quality prompts that generate the desired code on the first try. And my recommendation for anyone using code completion tools is to experiment with wait time between prompt and code generation and set a time that works for well for themselves. In my experience, the default wait time can be too short, resulting in code being generated before I can finish writing my prompt or reviewing the prompt I write.<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup></p> </li> <li> <p><strong>Use AI tools for help with getting help:</strong> So far the focus of this post has been on generating code to accomplish certain data science tasks. Perhaps the most important, and most difficult, data science task is asking good questions when you&rsquo;re stuck troubleshooting. And it usually requires or is greatly helped by creating a minimum reproducible example and using tools like <a href="https://reprex.tidyverse.org/" target="_blank" rel="noopener">reprex</a>. This often starts with creating a small dataset with certain features, and AI tools can be pretty useful for generating such toy examples.</p> </li> </ol> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>And maybe a future post on teaching R in the age of AI! <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>In fact, it&rsquo;s my preferred ggplot2 theme! <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>I&rsquo;ve never seen as many <a href="https://rdrr.io/r/base/print.html" target="_blank" rel="noopener"><code>print()</code></a> statements in R code as I have over the last year of reading code from hundreds of students who use AI tools to generate code for their assignments with varying levels of success! I don&rsquo;t know why these tools love <a href="https://rdrr.io/r/base/print.html" target="_blank" rel="noopener"><code>print()</code></a> statements! <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>For example, in RStudio, go to Tools &gt; Global Options &gt; select Copilot from the left menu and adjust &ldquo;Show code suggestions after keyboard idle (ms)&quot;. <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> rsample 1.3.0 https://www.tidyverse.org/blog/2025/04/rsample-1-3-0/ Thu, 03 Apr 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/04/rsample-1-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a> 1.3.0. rsample makes it easy to create resamples for assessing model performance. It is part of the tidymodels framework, a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"rsample"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will walk you through the more flexible grouping for calculating bootstrap confidence intervals and highlight the contributions made by participants of the tidyverse developer day.</p> <p>You can see a full list of changes in the <a href="https://rsample.tidymodels.org/news/index.html#rsample-130" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsample.tidymodels.org'>rsample</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="flexible-grouping-for-bootstrap-intervals">Flexible grouping for bootstrap intervals <a href="#flexible-grouping-for-bootstrap-intervals"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Resampling allows you get an understanding of the variability of an estimate, e.g., a summary statistic of your data. If you want to lean on statistical theory and get confidence intervals for your estimate, you can reach for the bootstrap resampling scheme: calculating your summary statistic on the bootstrap samples enables you to calculate confidence intervals around your point estimate.</p> <p>rsample contains a family of <code>int_*()</code> functions to calculate bootstrap confidence intervals of different flavors: percentile intervals, &ldquo;BCa&rdquo; intervals, and bootstrap-t intervals. If you want to dive into the technical details, Chapter 11 of <a href="https://hastie.su.domains/CASI/" target="_blank" rel="noopener">CASI</a> is a good place to start.</p> <p>You can calculate the confidence intervals based on a grouping in your data. However, so far, rsample would only let you provide a single grouping variable. With this release, we are extending this functionality to allow a more flexible grouping.</p> <p>The motivating application for us was to be able to calculate confidence intervals around multiple model performance metrics, including dynamic metrics for time-to-event models which depend on an evaluation time point. So in this case, the metric is one grouping variable and the evaluation time another. But let&rsquo;s pull back complexity for an example of how the new rsample functionality works!</p> <p>We have a dataset with delivery times for orders containing one or more items. We&rsquo;ll do some data wrangling with it, so we are also loading dplyr.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Attaching package: 'dplyr'</span></span> <span></span><span><span class='c'>#&gt; The following objects are masked from 'package:stats':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; filter, lag</span></span> <span></span><span><span class='c'>#&gt; The following objects are masked from 'package:base':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; intersect, setdiff, setequal, union</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>deliveries</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>deliveries</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10,012 × 31</span></span></span> <span><span class='c'>#&gt; time_to_delivery hour day distance item_01 item_02 item_03 item_04 item_05</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 16.1 11.9 Thu 3.15 0 0 2 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 22.9 19.2 Tue 3.69 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 30.3 18.4 Fri 2.06 0 0 0 0 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 33.4 15.8 Thu 5.97 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 27.2 19.6 Fri 2.52 0 0 0 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 19.6 13.0 Sat 3.35 1 0 0 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 22.1 15.5 Sun 2.46 0 0 1 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 26.6 17.0 Thu 2.21 0 0 1 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 30.8 16.7 Fri 2.62 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 17.4 11.9 Sun 2.75 0 2 1 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 10,002 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more variables: item_06 &lt;int&gt;, item_07 &lt;int&gt;, item_08 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_09 &lt;int&gt;, item_10 &lt;int&gt;, item_11 &lt;int&gt;, item_12 &lt;int&gt;, item_13 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_14 &lt;int&gt;, item_15 &lt;int&gt;, item_16 &lt;int&gt;, item_17 &lt;int&gt;, item_18 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_19 &lt;int&gt;, item_20 &lt;int&gt;, item_21 &lt;int&gt;, item_22 &lt;int&gt;, item_23 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_24 &lt;int&gt;, item_25 &lt;int&gt;, item_26 &lt;int&gt;, item_27 &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>Instead of fitting a whole model here, we are calculating a straightforward summary statistic for how much delivery time increases if an item is included in the order. So the item is one grouping factor. As a second one, we are using whether the order was delivered on a weekday or a weekend. Let&rsquo;s start by making that weekend indicator and reshaping the data to make it easier to calculate our summary statistic.</p> <p>Note that the name for the weekend indicator column, <code>.weekend</code>, starts with a dot. That is important as it is the convention to signal to rsample that this is an additional grouping variable.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>item_data</span> <span class='o'>&lt;-</span> <span class='nv'>deliveries</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>.weekend <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/ifelse.html'>ifelse</a></span><span class='o'>(</span><span class='nv'>day</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Sat"</span>, <span class='s'>"Sun"</span><span class='o'>)</span>, <span class='s'>"weekend"</span>, <span class='s'>"weekday"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span>, <span class='nv'>.weekend</span>, <span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"item"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"item"</span><span class='o'>)</span>, names_to <span class='o'>=</span> <span class='s'>"item"</span>, values_to <span class='o'>=</span> <span class='s'>"value"</span><span class='o'>)</span> </span></code></pre> </div> <p>Next, we are making a small function that calculates the ratio of average delivery times with and without the item included in the order, as a estimate of how much a specific item in an order increases the delivery time.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>relative_increase</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>data</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>data</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>includes_item <span class='o'>=</span> <span class='nv'>value</span> <span class='o'>&gt;</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span></span> <span> has <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span><span class='o'>[</span><span class='nv'>includes_item</span><span class='o'>]</span><span class='o'>)</span>,</span> <span> has_not <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>time_to_delivery</span><span class='o'>[</span><span class='o'>!</span><span class='nv'>includes_item</span><span class='o'>]</span><span class='o'>)</span>,</span> <span> .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>item</span>, <span class='nv'>.weekend</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>estimate <span class='o'>=</span> <span class='nv'>has</span> <span class='o'>/</span> <span class='nv'>has_not</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span>term <span class='o'>=</span> <span class='nv'>item</span>, <span class='nv'>.weekend</span>, <span class='nv'>estimate</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <p>We can calculate that on our entire dataset.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>relative_increase</span><span class='o'>(</span><span class='nv'>item_data</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 54 × 3</span></span></span> <span><span class='c'>#&gt; term .weekend estimate</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> item_01 weekday 1.07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> item_02 weekday 1.02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> item_03 weekday 1.02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> item_04 weekday 1.00</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> item_05 weekday 1.00</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> item_06 weekday 1.01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> item_07 weekday 1.03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> item_08 weekday 1.01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> item_09 weekday 1.01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> item_10 weekday 1.06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 44 more rows</span></span></span> <span></span></code></pre> </div> <p>This is fine, but what we really want here is to get confidence intervals around these estimates!</p> <p>So let&rsquo;s make bootstrap samples and calculate our statistic on those.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>item_bootstrap</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/bootstraps.html'>bootstraps</a></span><span class='o'>(</span><span class='nv'>item_data</span>, times <span class='o'>=</span> <span class='m'>1000</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>item_stats</span> <span class='o'>&lt;-</span></span> <span> <span class='nv'>item_bootstrap</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>stats <span class='o'>=</span> <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='nv'>splits</span>, <span class='o'>~</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='nv'>.x</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>relative_increase</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>Now we have everything we need to calculate the confidence intervals, stashed into the tibbles in the <code>stats</code> column: an <code>estimate</code>, a <code>term</code> (the primary grouping variable), and our additional grouping variable <code>.weekend</code>, starting with a dot. What&rsquo;s left to do is call one of the <code>int_*()</code> functions and specify which column contains the statistics. Here, we&rsquo;ll calculate percentile intervals with <a href="https://rsample.tidymodels.org/reference/int_pctl.html" target="_blank" rel="noopener"><code>int_pctl()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>item_ci</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/int_pctl.html'>int_pctl</a></span><span class='o'>(</span><span class='nv'>item_stats</span>, statistics <span class='o'>=</span> <span class='nv'>stats</span>, alpha <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span></span> <span><span class='nv'>item_ci</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 54 × 7</span></span></span> <span><span class='c'>#&gt; term .weekend .lower .estimate .upper .alpha .method </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> item_01 weekday 1.05 1.07 1.09 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> item_01 weekend 1.04 1.07 1.10 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> item_02 weekday 1.00 1.02 1.03 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> item_02 weekend 0.996 1.01 1.03 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> item_03 weekday 1.01 1.02 1.04 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> item_03 weekend 0.970 0.990 1.01 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> item_04 weekday 0.989 1.00 1.02 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> item_04 weekend 0.998 1.02 1.03 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> item_05 weekday 0.987 1.00 1.02 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> item_05 weekend 0.982 1.00 1.03 0.1 percentile</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 44 more rows</span></span></span> <span></span></code></pre> </div> <h2 id="tidyverse-developer-day">Tidyverse developer day <a href="#tidyverse-developer-day"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>At the tidyverse developer day after posit::conf, rsample got a lot of love in form of contributions by various community members. People improved documentation and examples, move deprecations along, tightened checks to support good practice, and upgraded errors and warnings, both in style and content. None of these changes are flashy new features but all of them are essential to rsample working well!</p> <p>So for example, leave-one-out (LOO) cross-validation is not a great choice of resampling scheme in most situations. From <a href="https://www.tmwr.org/resampling#leave-one-out-cross-validation" target="_blank" rel="noopener">Tidy modeling with R</a>:</p> <blockquote> <p>For anything but pathologically small samples, LOO is computationally excessive, and it may not have good statistical properties.</p> </blockquote> <p>It was possible, however, to create implicit LOO samples by using <a href="https://rsample.tidymodels.org/reference/vfold_cv.html" target="_blank" rel="noopener"><code>vfold_cv()</code></a> with the number of folds set to the number of rows in the data. With a dev day contribution, this now errors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rsample.tidymodels.org/reference/vfold_cv.html'>vfold_cv</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, v <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `vfold_cv()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Leave-one-out cross-validation is not supported by this function.</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> You set `v` to `nrow(data)`, which would result in a leave-one-out</span></span> <span><span class='c'>#&gt; cross-validation.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `loo_cv()` in this case.</span></span> <span></span></code></pre> </div> <p>This is to make users pause and consider if this a good choice for their dataset. If you require LOO, you can still use <a href="https://rsample.tidymodels.org/reference/loo_cv.html" target="_blank" rel="noopener"><code>loo_cv()</code></a>.</p> <p>Error messages in general have been a focus of ours across various tidymodels packages, rsample is no exception. We opened a bunch of issues to tackle all of rsample - and all got closed! Some of these changes are purely internal, upgrading manual formatting to let the cli package do the work. While the error message in most cases doesn&rsquo;t <em>look</em> different, it&rsquo;s a great deal more consistency in formatting.</p> <p>For some error messages, the additional functionality in cli makes it easy to improve readability. This error message used to be one block of text, now it comes as three bullet points.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rsample.tidymodels.org/reference/permutations.html'>permutations</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `permutations()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> You have selected all columns to permute.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> This effectively reorders the rows in the original data without changing the</span></span> <span><span class='c'>#&gt; data structure.</span></span> <span><span class='c'>#&gt; → Please select fewer columns to permute.</span></span> <span></span></code></pre> </div> <p>Changes like these are super helpful to users and developers alike. A big thank you to all the contributors!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many thanks to all the people who contributed to rsample since the last release!</p> <p> <a href="https://github.com/agmurray" target="_blank" rel="noopener">@agmurray</a>, <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>, <a href="https://github.com/ccani007" target="_blank" rel="noopener">@ccani007</a>, <a href="https://github.com/dicook" target="_blank" rel="noopener">@dicook</a>, <a href="https://github.com/Dpananos" target="_blank" rel="noopener">@Dpananos</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gregor-fausto" target="_blank" rel="noopener">@gregor-fausto</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>, <a href="https://github.com/jttoivon" target="_blank" rel="noopener">@jttoivon</a>, <a href="https://github.com/krz" target="_blank" rel="noopener">@krz</a>, <a href="https://github.com/laurabrianna" target="_blank" rel="noopener">@laurabrianna</a>, <a href="https://github.com/malcolmbarrett" target="_blank" rel="noopener">@malcolmbarrett</a>, <a href="https://github.com/MatthieuStigler" target="_blank" rel="noopener">@MatthieuStigler</a>, <a href="https://github.com/msberends" target="_blank" rel="noopener">@msberends</a>, <a href="https://github.com/nmercadeb" target="_blank" rel="noopener">@nmercadeb</a>, <a href="https://github.com/PriKalra" target="_blank" rel="noopener">@PriKalra</a>, <a href="https://github.com/seb09" target="_blank" rel="noopener">@seb09</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>, and <a href="https://github.com/zz77zz" target="_blank" rel="noopener">@zz77zz</a>.</p> Improved sparsity support in tidymodels https://www.tidyverse.org/blog/2025/03/tidymodels-sparsity/ Wed, 19 Mar 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/03/tidymodels-sparsity/ <p>Photo by <a href="https://unsplash.com/@oxygenvisuals?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Oliver Olah</a> on <a href="https://unsplash.com/photos/green-tree-in-the-middle-of-grass-field-KD8nzFznQQ0?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a></p> <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re stoked to announce tidymodels now fully supports sparse data from end to end. We have been working on this for <a href="https://github.com/tidymodels/recipes/pull/515" target="_blank" rel="noopener">over 5 years</a>. This is an extension of the work we have done <a href="https://www.tidyverse.org/blog/2020/11/tidymodels-sparse-support/" target="_blank" rel="noopener">previously</a> with blueprints, which would carry the data sparsely some of the way.</p> <p>You will need <a href="https://recipes.tidymodels.org/news/index.html#recipes-120" target="_blank" rel="noopener">recipes 1.2.0</a>, <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-130" target="_blank" rel="noopener">parsnip 1.3.0</a>, <a href="https://workflows.tidymodels.org/news/index.html#workflows-120" target="_blank" rel="noopener">workflows 1.2.0</a> or later for this to work.</p> <h2 id="what-are-sparse-data">What are sparse data? <a href="#what-are-sparse-data"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The term <strong>sparse data</strong> refers to a data set containing many zeroes. Sparse data appears in all kinds of fields and can be produced in a number of preprocessing methods. The reason why we care about sparse data is because of how computers store numbers. A 32-bit integer value takes 4 bytes to store. An array of 32-bit integers takes 40 bytes, and so on. This happens because each value is written down.</p> <p>A sparse representation instead stores the locations and values of the non-zero entries. Suppose we have the following vector with 20 entries:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">3</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">7</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">)</span> </code></pre></div><p>It could be represented sparsely using the 3 values <code>positions = c(1, 3, 7)</code>, <code>values = c(3, 5, 8)</code>, and <code>length = 20</code>. Now, we have seven values to represent a vector of 20 elements. Since some modeling tasks contain even sparser data, this type of representation starts to show real benefits in terms of execution time and memory consumption.</p> <p>The tidymodels set of packages has undergone several internal changes to allow it to represent data sparsely internally when it would be beneficial. These changes allow you to fit models that contain sparse data faster and more memory efficiently than before. Moreover, it allows you to fit models previously not possible due to them not fitting in memory.</p> <h2 id="sparse-matrix-support">Sparse matrix support <a href="#sparse-matrix-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first benefit of these changes is that <code>recipe()</code>, <code>prep()</code>, <code>bake()</code>, <code>fit()</code>, and <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> now accept sparse matrices created using the Matrix package.</p> <p>The <code>permeability_qsar</code> data set from the modeldata package contains quite a lot of zeroes in the predictors, so we will use it as a demonstration. Starting by coercing it into a sparse matrix.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://Matrix.R-forge.R-project.org'>Matrix</a></span><span class='o'>)</span></span> <span><span class='nv'>permeability_sparse</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/methods/as.html'>as</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/matrix.html'>as.matrix</a></span><span class='o'>(</span><span class='nv'>permeability_qsar</span><span class='o'>)</span>, <span class='s'>"sparseMatrix"</span><span class='o'>)</span></span></code></pre> </div> <p>We can now use this sparse matrix in our code the same way as a dense matrix or data frame:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>permeability</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>permeability_sparse</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>mod_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='s'>"regression"</span>, <span class='s'>"xgboost"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>mod_spec</span><span class='o'>)</span></span></code></pre> </div> <p>Model training has the usual syntax:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, <span class='nv'>permeability_sparse</span><span class='o'>)</span></span></code></pre> </div> <p>as does prediction:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, <span class='nv'>permeability_sparse</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 165 × 1</span></span></span> <span><span class='c'>#&gt; .pred</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 10.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1.50 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 13.1 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1.10 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 1.25 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 0.738</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 29.3 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2.44 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 36.3 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 4.31 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 155 more rows</span></span></span> <span></span></code></pre> </div> <p>Note that only some models/engines work well with sparse data. These are all listed here <a href="https://www.tidymodels.org/find/sparse/">https://www.tidymodels.org/find/sparse/</a>. If the model doesn&rsquo;t support sparse data, it will be coerced into the default non-sparse representation and used as usual.</p> <p>With a few exceptions, it should work like any other data set. However, this approach has two main limitations. The first is that we are limited to regression tasks since the outcome has to be numeric to be part of the sparse matrix.</p> <p>The second limitation is that it only works with non-formula methods for parsnip and workflows. This means that you can use a recipe with <code>add_recipe()</code> or select variables directly with <code>add_variables()</code> when using a workflow. And you need to use <code>fit_xy()</code> instead of <code>fit()</code> when using a parsnip object by itself.</p> <p>If this is of interest we also have a <a href="https://www.tidymodels.org/">https://www.tidymodels.org/</a> post about <a href="https://www.tidymodels.org/learn/work/sparse-matrix/" target="_blank" rel="noopener">using sparse matrices in tidymodels</a>.</p> <h2 id="sparse-data-from-recipes-steps">Sparse data from recipes steps <a href="#sparse-data-from-recipes-steps"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Where this sparsity support really starts to shine is when the recipe we use will generate sparse data. They come in two flavors, sparsity creation steps and sparsity preserving steps. Both listed here: <a href="https://www.tidymodels.org/find/sparse/">https://www.tidymodels.org/find/sparse/</a>.</p> <p>Some steps like <code>step_dummy()</code>, <code>step_indicate_na()</code>, and <a href="https://textrecipes.tidymodels.org/reference/step_tf.html" target="_blank" rel="noopener"><code>textrecipes::step_tf()</code></a> will almost always produce a lot of zeroes. We take advantage of that by generating it sparsely when it is beneficial. If these steps end up producing sparse vectors, we want to make sure the sparsity is preserved. A couple of handfuls of steps, such as <code>step_impute_mean()</code> and <code>step_scale(),</code> have been updated to be able to work efficiently with sparse vectors. Both types of steps are detailed in the above-linked list of compatible methods.</p> <p>What this means in practice is that if you use a model/engine that supports sparse data and have a recipe that produces enough sparse data, then the steps will switch to produce sparse data by using a new sparse data format to store the data (when appropriate) as the recipe is being processed. Then if the model can accept sparse objects, we convert the data from our new sparse format to a standard sparse matrix object. Increasing performance when possible while preserving performance otherwise.</p> <p>Below is a simple recipe using the <code>ames</code> data set. <code>step_dummy()</code> is applied to all the categorical predictors, leading to a significant amount of zeroes.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>mod_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='s'>"regression"</span>, <span class='s'>"xgboost"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>mod_spec</span><span class='o'>)</span></span></code></pre> </div> <p>When we go to fit it now, it takes around 125ms and allocates 37.2MB. Compared to before these changes it would take around 335ms and allocate 67.5MB.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, <span class='nv'>ames</span><span class='o'>)</span></span></code></pre> </div> <p>We see similar speedups when we predictor with around 20ms and 25.2MB now, compared to around 60ms and 55.6MB before.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, <span class='nv'>ames</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,930 × 1</span></span></span> <span><span class='c'>#&gt; .pred</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>208</span>649.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>115</span>339.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>148</span>634.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>239</span>770.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>190</span>082.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>184</span>604.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>208</span>572.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>177</span>403 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>261</span>000.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>198</span>604.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2,920 more rows</span></span></span> <span></span></code></pre> </div> <p>These improvements are tightly related to memory allocation, which depends on the sparsity of the data set produced by the recipe. This is why it is hard to say how much benefit you will see. We have seen orders of magnitudes of improvements, both in terms of time and memory allocation. We have also been able to fit models where previously the data was too big to fit in memory.</p> <p>Please see the post on tidymodels.org, which goes into more detail about when you are likely to benefit from this and how to change your recipes and workflows to take full advantage of this new feature.</p> <p>There is also a <a href="https://www.tidymodels.org/">https://www.tidymodels.org/</a> post going into a bit more detail about how to <a href="https://www.tidymodels.org/learn/work/sparse-recipe/" target="_blank" rel="noopener">use recipes to produce sparse data</a>.</p> Q1 2025 tidymodels digest https://www.tidyverse.org/blog/2025/02/tidymodels-2025-q1/ Thu, 27 Feb 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/02/tidymodels-2025-q1/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>The tidymodels framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing quarterly updates here on the tidyverse blog summarizing what’s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the tidymodels tag to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused.</p> <p>We&rsquo;ve sent a steady stream of tidymodels packages to CRAN recently. We usually release them in batches since many of our packages are tightly coupled with one another. Internally, this process is referred to as the &ldquo;cascade&rdquo; of CRAN submissions.</p> <p>The post will update you on which packages have changed and the major improvements you should know about.</p> <p>Here&rsquo;s a list of the packages and their News sections:</p> <ul> <li> <a href="https://baguette.tidymodels.org/news/index.html" target="_blank" rel="noopener">baguette</a></li> <li> <a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">brulee</a></li> <li> <a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">censored</a></li> <li> <a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">dials</a></li> <li> <a href="https://hardhat.tidymodels.org/news/index.html" target="_blank" rel="noopener">hardhat</a></li> <li> <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">parsnip</a></li> <li> <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">recipes</a></li> <li> <a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">tidymodels</a></li> <li> <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">tune</a></li> <li> <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">workflows</a></li> </ul> <p>Let&rsquo;s look at a few specific updates.</p> <h2 id="improvements-in-errors-and-warnings">Improvements in errors and warnings <a href="#improvements-in-errors-and-warnings"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A group effort was made to improve our error and warning messages across many packages. This started with an internal &ldquo;upkeep week&rdquo; (which ended up being 3-4 weeks) and concluded at the <a href="https://www.tidyverse.org/blog/2024/04/tdd-2024/" target="_blank" rel="noopener">Tidy Dev Day in Seattle</a> after posit::conf(2024).</p> <p>The goal was to use new tools in the cli and rlang packages to make messages more informative than they used to be. For example, using:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">tidy</span><span class="p">(</span><span class="n">pca_extract_trained</span><span class="p">,</span> <span class="n">number</span> <span class="o">=</span> <span class="m">3</span><span class="p">,</span> <span class="n">type</span> <span class="o">=</span> <span class="s">&#34;variances&#34;</span><span class="p">)</span> </code></pre></div><p>used to result in the error message:</p> <pre><code>Error in `match.arg()`: ! 'arg' should be one of &quot;coef&quot;, &quot;variance&quot; </code></pre><p>The new system references the function that you called and not the underlying base R function that actually errored. It also suggests a solution:</p> <pre><code>Error in `tidy()`: ! `type` must be one of &quot;coef&quot; or &quot;variance&quot;, not &quot;variances&quot;. i Did you mean &quot;variance&quot;? </code></pre><p>The rlang package created a set of <a href="https://usethis.r-lib.org/reference/use_standalone.html" target="_blank" rel="noopener">standalone files</a> that contain high-quality type checkers and related functions. This also improves the information that users get from an error. For example, using an inappropriate formula value in <code>fit(linear_reg(), &quot;boop&quot;, mtcars)</code>, the old message was:</p> <pre><code>Error in `fit()`: ! The `formula` argument must be a formula, but it is a &lt;character&gt;. </code></pre><p>and now you see:</p> <pre><code>Error in `fit()`: ! `formula` must be a formula, not the string &quot;boop&quot;. </code></pre><p>This was <em>a lot</em> of work and we&rsquo;re still aren’t finished. Two events helped us get as far as we did.</p> <p>First, Simon Couch made the <a href="https://simonpcouch.github.io/chores/" target="_blank" rel="noopener">chores</a> package (its previous name was &ldquo;pal&rdquo;), which enabled us to use AI tools to solve small-scope problems, such as converting old rlang error code to use the new <a href="https://rlang.r-lib.org/reference/topic-condition-formatting.html" target="_blank" rel="noopener">cli syntax</a>. I can’t overstate how much of a speed-up this was for us.</p> <p>Second, at developer day, many external folks pitched in to make pull requests from a list of issues:</p> <div class="figure" style="text-align: center"> <img src="IMG_4743.jpeg" alt="Organizing Tidy Dev Day issues." /> <p class="caption">Organizing Tidy Dev Day issues.</p> </div> <p>I love these sessions for many reasons, but mostly because we meet users and contributors to our packages in person and work with them on specific tasks.</p> <p>There is a lot more to do here; we have a lot of secondary packages that would benefit from these improvements too.</p> <h2 id="quantile-regression-in-parsnip">Quantile regression in parsnip <a href="#quantile-regression-in-parsnip"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One big update in parsnip was a new modeling mode of <code>&quot;quantile regression&quot;</code>. Daniel McDonald and Ryan Tibshirani largely provided some inertia for this work based on their <a href="https://delphi.cmu.edu/" target="_blank" rel="noopener">disease modeling framework</a>.</p> <p>You can generate quantile predictions by first creating a model specification, which includes the quantiles that you want to predict:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="nf">tidymodels_prefer</span><span class="p">()</span> <span class="n">ames</span> <span class="o">&lt;-</span> <span class="n">modeldata</span><span class="o">::</span><span class="n">ames</span> <span class="o">|&gt;</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">Sale_Price</span> <span class="o">=</span> <span class="nf">log10</span><span class="p">(</span><span class="n">Sale_Price</span><span class="p">))</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">Sale_Price</span><span class="p">,</span> <span class="n">Latitude</span><span class="p">)</span> <span class="n">quant_spec</span> <span class="o">&lt;-</span> <span class="nf">linear_reg</span><span class="p">()</span> <span class="o">|&gt;</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;quantreg&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;quantile regression&#34;</span><span class="p">,</span> <span class="n">quantile_levels</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0.1</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="m">0.9</span><span class="p">))</span> <span class="n">quant_spec</span> </code></pre></div><pre><code>## Linear Regression Model Specification (quantile regression) ## ## Computational engine: quantreg </code></pre><pre><code>## Quantile levels: 0.1, 0.5, and 0.9. </code></pre><p>We&rsquo;ll add some spline terms via a recipe and fit the model:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">spline_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">Sale_Price</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">step_spline_natural</span><span class="p">(</span><span class="n">Latitude</span><span class="p">,</span> <span class="n">deg_free</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> <span class="n">quant_fit</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">(</span><span class="n">spline_rec</span><span class="p">,</span> <span class="n">quant_spec</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">fit</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">)</span> <span class="n">quant_fit</span> </code></pre></div><pre><code>## ══ Workflow [trained] ═════════════════════════════════════════════════ ## Preprocessor: Recipe ## Model: linear_reg() ## ## ── Preprocessor ─────────────────────────────────────────────────────── ## 1 Recipe Step ## ## • step_spline_natural() ## ## ── Model ────────────────────────────────────────────────────────────── ## Call: ## quantreg::rq(formula = ..y ~ ., tau = quantile_levels, data = data) ## ## Coefficients: ## tau= 0.1 tau= 0.5 tau= 0.9 ## (Intercept) 4.71981123 5.07728741 5.25221335 ## Latitude_01 1.22409173 0.70928577 0.79000849 ## Latitude_02 0.19561816 0.04937750 0.02832633 ## Latitude_03 0.16616065 0.02045910 0.14730573 ## Latitude_04 0.30583648 0.08489487 0.15595080 ## Latitude_05 0.21663212 0.02016258 -0.01110625 ## Latitude_06 0.33541228 0.12005254 0.03006777 ## Latitude_07 0.47732205 0.09146728 0.17394021 ## Latitude_08 0.24028784 0.30450058 0.26144584 ## Latitude_09 0.05840312 -0.14733781 -0.11911843 ## Latitude_10 1.52800673 0.95994216 1.21750501 ## ## Degrees of freedom: 2930 total; 2919 residual </code></pre><p>For prediction, tidymodels always returns a data frame with as many rows as the input data set (here: <code>ames</code>). The result for quantile predictions is a special vctrs class:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">quant_pred</span> <span class="o">&lt;-</span> <span class="nf">predict</span><span class="p">(</span><span class="n">quant_fit</span><span class="p">,</span> <span class="n">ames</span><span class="p">)</span> <span class="n">quant_pred</span> <span class="o">|&gt;</span> <span class="nf">slice</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">4</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 4 × 1 ## .pred_quantile ## &lt;qtls(3)&gt; ## 1 [5.33] ## 2 [5.33] ## 3 [5.33] ## 4 [5.31] </code></pre><div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">class</span><span class="p">(</span><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile</span><span class="p">)</span> </code></pre></div><pre><code>## [1] &quot;quantile_pred&quot; &quot;vctrs_vctr&quot; &quot;list&quot; </code></pre><p>where the output <code>[5.31]</code> shows the middle quantile.</p> <p>We can expand the set of quantile predictions so that there are three rows for each source row in <code>ames</code>. There’s also an integer column called <code>.row</code> so that we can merge the data with the source data:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile[1]</span> </code></pre></div><pre><code>## &lt;quantiles[1]&gt; ## [1] [5.33] ## # Quantile levels: 0.1 0.5 0.9 </code></pre><div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">as_tibble</span><span class="p">(</span><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile[1]</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 3 × 3 ## .pred_quantile .quantile_levels .row ## &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; ## 1 5.08 0.1 1 ## 2 5.33 0.5 1 ## 3 5.52 0.9 1 </code></pre><p>Here are the predicted quantile values:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">quant_pred</span><span class="o">$</span><span class="n">.pred_quantile</span> <span class="o">|&gt;</span> <span class="nf">as_tibble</span><span class="p">()</span> <span class="o">|&gt;</span> <span class="nf">full_join</span><span class="p">(</span><span class="n">ames</span> <span class="o">|&gt;</span> <span class="nf">add_rowindex</span><span class="p">(),</span> <span class="n">by</span> <span class="o">=</span> <span class="s">&#34;.row&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">arrange</span><span class="p">(</span><span class="n">Latitude</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">Latitude</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">,</span> <span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">Sale_Price</span><span class="p">),</span> <span class="n">alpha</span> <span class="o">=</span> <span class="m">1</span> <span class="o">/</span> <span class="m">5</span><span class="p">)</span> <span class="o">+</span> <span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="n">.pred_quantile</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="nf">format</span><span class="p">(</span><span class="n">.quantile_levels</span><span class="p">)),</span> <span class="n">show.legend</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">linewidth</span> <span class="o">=</span> <span class="m">1.5</span><span class="p">)</span> </code></pre></div><div class="figure" style="text-align: center"> <img src="figure/quant-plot-1.svg" alt="10%, 50%, and 90% quantile predictions." width="80%" /> <p class="caption">10%, 50%, and 90% quantile predictions.</p> </div> <p>For now, the new mode does not have many engines. We need to implement some performance statistics in the yardstick package before integrating these models into the whole tidymodels ecosystem.</p> <p>In other news, we’ve added some additional neural network models based on some improvements in the brulee package. Namely, two-layer networks can be tuned for feed-forward networks on tabular data (using torch).</p> <p>One other improvement has been simmering for a long time: the ability to exploit sparse data structures better. We’ve improved our <code>fit()</code> interfaces for the few model engines that can use sparsely encoded data. There is much more to come on this in a few months, especially around recipes, so stay tuned.</p> <p>Finally, we’ve created a set of <a href="https://parsnip.tidymodels.org/articles/checklists.html" target="_blank" rel="noopener">checklists</a> that can be used when creating new models or engines. These are very helpful, even for us, since there is a lot of minutiae to remember.</p> <h2 id="parallelism-in-tune">Parallelism in tune <a href="#parallelism-in-tune"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This was a small maintenance release mostly related to parallel processing. Up to now, tune facilitated parallelism using the <a href="https://cran.r-project.org/package=foreach" target="_blank" rel="noopener">foreach</a> package. That package is mature but not actively developed, so we have been slowly moving toward using the <a href="https://www.futureverse.org/packages-overview.html" target="_blank" rel="noopener">future</a> package(s).</p> <p>The <a href="https://www.tidyverse.org/blog/2024/04/tune-1-2-0/#modernized-support-for-parallel-processing" target="_blank" rel="noopener">first step in this journey</a> was to keep using foreach internally (but lean toward future) but to encourage users to move from directly invoking the foreach package and, instead, load and use the future package.</p> <p>We’re now moving folks into the second stage. tune will now raise a warning when:</p> <ul> <li>A parallel backend has been registered with foreach, and</li> <li>No <a href="https://future.futureverse.org/reference/plan.html" target="_blank" rel="noopener"><code>plan()</code></a> has been specified with future.</li> </ul> <p>This will allow users to transition their existing code to only future and allow us to update existing documentation and training materials.</p> <p>We anticipate that the third stage, <strong>removing foreach entirely</strong>, will occur sometime before posit::conf(2025) in September.</p> <h2 id="things-to-look-forward-to">Things to look forward to <a href="#things-to-look-forward-to"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We are working hard on a few major initiatives that we plan on showing off at <a href="https://posit.co/conference/" target="_blank" rel="noopener">posit::conf(2025)</a>.</p> <p>First is integrated support for sparse <strong>data</strong>. The emphasis is on &ldquo;data&rdquo; because users can use a data frame of sparse vectors <em>or</em> the usual sparse matrix format. This is a big deal because it does not force you to convert non-numeric data into a numeric matrix format. Again, we’ll discuss this more in the future, but you should be able to use sparse data frames in parsnip, recipes, tune, etc.</p> <p>The second initiative is the longstanding goal of adding <strong>postprocessing</strong> to tidymodels. Just as you can add a preprocessor to a model workflow, you will be able to add a set of postprocessing adjustments to the predictions your model generates. See our <a href="https://www.tidyverse.org/blog/2024/10/postprocessing-preview/" target="_blank" rel="noopener">previous post</a> for a sneak peek.</p> <p>Finally, this year&rsquo;s <a href="https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/" target="_blank" rel="noopener">summer internship</a> focuses on supervised feature selection methods. We’ll also have releases (and probably another package) for these tools.</p> <p>These should come to fruition (and CRAN) before or around August 2025.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We want to sincerely thank everyone who contributed to these packages since their previous versions:</p> <p> <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>, <a href="https://github.com/asb2111" target="_blank" rel="noopener">@asb2111</a>, <a href="https://github.com/balraadjsings" target="_blank" rel="noopener">@balraadjsings</a>, <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/beansrowning" target="_blank" rel="noopener">@beansrowning</a>, <a href="https://github.com/BrennanAntone" target="_blank" rel="noopener">@BrennanAntone</a>, <a href="https://github.com/cheryldietrich" target="_blank" rel="noopener">@cheryldietrich</a>, <a href="https://github.com/chillerb" target="_blank" rel="noopener">@chillerb</a>, <a href="https://github.com/conarr5" target="_blank" rel="noopener">@conarr5</a>, <a href="https://github.com/corybrunson" target="_blank" rel="noopener">@corybrunson</a>, <a href="https://github.com/dajmcdon" target="_blank" rel="noopener">@dajmcdon</a>, <a href="https://github.com/davidrsch" target="_blank" rel="noopener">@davidrsch</a>, <a href="https://github.com/Edgar-Zamora" target="_blank" rel="noopener">@Edgar-Zamora</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gimholte" target="_blank" rel="noopener">@gimholte</a>, <a href="https://github.com/grantmcdermott" target="_blank" rel="noopener">@grantmcdermott</a>, <a href="https://github.com/grouptheory" target="_blank" rel="noopener">@grouptheory</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/ilaria-kode" target="_blank" rel="noopener">@ilaria-kode</a>, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>, <a href="https://github.com/jesusherranz" target="_blank" rel="noopener">@jesusherranz</a>, <a href="https://github.com/jkylearmstrong" target="_blank" rel="noopener">@jkylearmstrong</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/joscani" target="_blank" rel="noopener">@joscani</a>, <a href="https://github.com/Joscelinrocha" target="_blank" rel="noopener">@Joscelinrocha</a>, <a href="https://github.com/josho88" target="_blank" rel="noopener">@josho88</a>, <a href="https://github.com/joshuagi" target="_blank" rel="noopener">@joshuagi</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/jrwinget" target="_blank" rel="noopener">@jrwinget</a>, <a href="https://github.com/KarlKoe" target="_blank" rel="noopener">@KarlKoe</a>, <a href="https://github.com/kscott-1" target="_blank" rel="noopener">@kscott-1</a>, <a href="https://github.com/lilykoff" target="_blank" rel="noopener">@lilykoff</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/LouisMPenrod" target="_blank" rel="noopener">@LouisMPenrod</a>, <a href="https://github.com/luisDVA" target="_blank" rel="noopener">@luisDVA</a>, <a href="https://github.com/marcelglueck" target="_blank" rel="noopener">@marcelglueck</a>, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>, <a href="https://github.com/martaalcalde" target="_blank" rel="noopener">@martaalcalde</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mihem" target="_blank" rel="noopener">@mihem</a>, <a href="https://github.com/mitchellmanware" target="_blank" rel="noopener">@mitchellmanware</a>, <a href="https://github.com/naokiohno" target="_blank" rel="noopener">@naokiohno</a>, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>, <a href="https://github.com/npelikan" target="_blank" rel="noopener">@npelikan</a>, <a href="https://github.com/obgeneralao" target="_blank" rel="noopener">@obgeneralao</a>, <a href="https://github.com/owenjonesuob" target="_blank" rel="noopener">@owenjonesuob</a>, <a href="https://github.com/pbhogale" target="_blank" rel="noopener">@pbhogale</a>, <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>, <a href="https://github.com/reisner" target="_blank" rel="noopener">@reisner</a>, <a href="https://github.com/rfsaldanha" target="_blank" rel="noopener">@rfsaldanha</a>, <a href="https://github.com/rkb965" target="_blank" rel="noopener">@rkb965</a>, <a href="https://github.com/RobLBaker" target="_blank" rel="noopener">@RobLBaker</a>, <a href="https://github.com/RodDalBen" target="_blank" rel="noopener">@RodDalBen</a>, <a href="https://github.com/SantiagoD999" target="_blank" rel="noopener">@SantiagoD999</a>, <a href="https://github.com/shum461" target="_blank" rel="noopener">@shum461</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>, <a href="https://github.com/therealjpetereit" target="_blank" rel="noopener">@therealjpetereit</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/walkerjameschris" target="_blank" rel="noopener">@walkerjameschris</a>, and <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a></p> Air, an extremely fast R formatter https://www.tidyverse.org/blog/2025/02/air/ Fri, 21 Feb 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/02/air/ <p>We&rsquo;re thrilled to announce <a href="https://posit-dev.github.io/air/" target="_blank" rel="noopener">Air</a>, an extremely fast R formatter. Formatters are used to automatically style code, but I find that it&rsquo;s much easier to show what Air can do rather than tell, so we&rsquo;ll start with a few examples. In the video below, we&rsquo;re inside <a href="https://positron.posit.co/" target="_blank" rel="noopener">Positron</a> and we&rsquo;re looking at some unformatted code. Saving the file (yep, that&rsquo;s it!) invokes Air, which automatically and instantaneously formats the code.</p> <video controls autoplay loop muted width="100%" src="video/case-when.mov" style="border: 2px solid #CCC;"> </video> <p>Next, let&rsquo;s go over to <a href="https://posit.co/products/open-source/rstudio/" target="_blank" rel="noopener">RStudio</a>. Here we&rsquo;ve got a pipe chain that could use a little formatting. Like in Positron, just save the file:</p> <video controls autoplay loop muted width="100%" src="video/ggplot.mov" style="border: 2px solid #CCC;"> </video> <p>Lastly, we&rsquo;ll jump back into Positron. Rather than formatting a single file on save, you might want to instead format an entire project (particularly when first adopting Air). To do so, just run <code>air format .</code> in a terminal from the project root, and Air will recursively format any R files it finds along the way (smartly excluding known generated files, like <code>cpp11.R</code>). Here we&rsquo;ll run Air on dplyr for the first time ever, analyzing and reformatting over 150 files instantly:</p> <video controls autoplay loop muted width="100%" src="video/project.mov" style="border: 2px solid #CCC;"> </video> <p>Within the tidyverse, we&rsquo;re already using Air in some of our largest packages, like <a href="https://github.com/tidyverse/dplyr/pull/7662" target="_blank" rel="noopener">dplyr</a>, <a href="https://github.com/tidyverse/tidyr/pull/1591" target="_blank" rel="noopener">tidyr</a>, and <a href="https://github.com/tidymodels/recipes/pull/1425" target="_blank" rel="noopener">recipes</a>.</p> <p>Throughout the rest of this post you&rsquo;ll learn what a formatter is, why you&rsquo;d want to use one, and you&rsquo;ll learn a little about how Air decides to format your R code.</p> <p>Note that Air is still in beta, so there may be some breaking changes over the next few releases. We also encourage you to use it in combination with a version control system, like git, so that you can clearly see the changes Air makes. That said, we still feel very confident in the current state of Air, and can&rsquo;t wait for you to try it!</p> <h2 id="installing-air">Installing Air <a href="#installing-air"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you already know how formatters work and want to jump straight in, follow one of the guides below:</p> <ul> <li> <p>For Positron, Air is <a href="https://open-vsx.org/extension/posit/air-vscode" target="_blank" rel="noopener">available</a> on OpenVSX as an Extension. Install it from the Extensions pane within Positron, then read our <a href="https://posit-dev.github.io/air/editor-vscode.html" target="_blank" rel="noopener">Positron guide</a>.</p> </li> <li> <p>For VS Code, Air is <a href="https://marketplace.visualstudio.com/items?itemName=Posit.air-vscode" target="_blank" rel="noopener">available</a> on the VS Code Marketplace as an Extension. Install it from the Extensions pane within VS Code, then read our <a href="https://posit-dev.github.io/air/editor-vscode.html" target="_blank" rel="noopener">VS Code guide</a>.</p> </li> <li> <p>For RStudio, Air can be set as an external formatter, but you&rsquo;ll need to install the command line tool for Air first. Read our <a href="https://posit-dev.github.io/air/editor-rstudio.html" target="_blank" rel="noopener">RStudio guide</a> to get started. Note that this is an experimental feature on the RStudio side, so the exact setup may change a little until it is fully stabilized.</p> </li> <li> <p>For command line users, Air binaries can be installed using our <a href="https://posit-dev.github.io/air/cli.html" target="_blank" rel="noopener">standalone installer scripts</a>.</p> </li> </ul> <p>For both Positron and VS Code, the most important thing to enable after installing the extension is format on save for R. You can do that by adding these lines to your <code>settings.json</code> file:</p> <div class="highlight"><pre class="chroma"><code class="language-json" data-lang="json"><span class="p">{</span> <span class="nt">&#34;[r]&#34;</span><span class="p">:</span> <span class="p">{</span> <span class="nt">&#34;editor.formatOnSave&#34;</span><span class="p">:</span> <span class="kc">true</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div><p>To open your <code>settings.json</code> file, run one of the following from the Command Palette:</p> <ul> <li> <p>Run <code>Preferences: Open User Settings (JSON)</code> to modify global user settings.</p> </li> <li> <p>Run <code>Preferences: Open Workspace Settings (JSON)</code> to modify project specific settings. You may want to use this instead of setting the user level setting if you drop in on multiple projects, but not all of them use Air. If you work on a project with collaborators, we recommend that you check in these project specific settings to your repository to ensure that every collaborator is using the same formatting settings.</p> </li> </ul> <p>If your preferred editor isn&rsquo;t listed here, but does support the <a href="https://microsoft.github.io/language-server-protocol/" target="_blank" rel="noopener">Language Server Protocol</a>, then it is likely that we can add support for Air there as well.</p> <p>If you have any questions or run into issues installing or using Air, feel free to open an <a href="https://github.com/posit-dev/air/issues" target="_blank" rel="noopener">issue</a>!</p> <h2 id="whats-a-formatter">What&rsquo;s a formatter? <a href="#whats-a-formatter"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A formatter is in charge of the <em>layout</em> of your R code. Formatters do not change the meaning of code; instead they ensure that whitespace, newlines, and other punctuation conform to a set of rules and standards, such as:</p> <ul> <li> <p>Making sure your code is <strong>indented</strong> with the appropriate amount of leading whitespace. By default, Air uses 2 spaces for indentation. You will see this indentation in pipelines:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">data</span> <span class="o">|&gt;</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">()</span> </code></pre></div><p>As well as in function calls:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">list</span><span class="p">(</span> <span class="n">foo</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">bar</span> <span class="o">=</span> <span class="m">2</span> <span class="p">)</span> </code></pre></div></li> <li> <p>Preventing your code from overflowing a given <strong>line width</strong>. By default, Air uses a line width of 80 characters. It enforces this by splitting long lines of code over multiple lines. For instance, notice how long these expressions are, they would &ldquo;overflow&rdquo; past 80 characters:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">band_members</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">full_join</span><span class="p">(</span><span class="n">band_instruments2</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="nf">join_by</span><span class="p">(</span><span class="n">name</span> <span class="o">==</span> <span class="n">artist</span><span class="p">))</span> <span class="n">left_join</span> <span class="o">&lt;-</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span> <span class="n">copy</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">suffix</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;.x&#34;</span><span class="p">,</span> <span class="s">&#34;.y&#34;</span><span class="p">),</span> <span class="kc">...</span><span class="p">,</span> <span class="n">keep</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="nf">UseMethod</span><span class="p">(</span><span class="s">&#34;left_join&#34;</span><span class="p">)</span> <span class="p">}</span> </code></pre></div><p>Air reformats these expressions by switching them from a horizontal layout (called &ldquo;flat&rdquo;) to a vertical one (called &ldquo;expanded&rdquo;):</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">band_members</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">full_join</span><span class="p">(</span><span class="n">band_instruments2</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="nf">join_by</span><span class="p">(</span><span class="n">name</span> <span class="o">==</span> <span class="n">artist</span><span class="p">))</span> <span class="n">left_join</span> <span class="o">&lt;-</span> <span class="nf">function</span><span class="p">(</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span> <span class="n">copy</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">suffix</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;.x&#34;</span><span class="p">,</span> <span class="s">&#34;.y&#34;</span><span class="p">),</span> <span class="kc">...</span><span class="p">,</span> <span class="n">keep</span> <span class="o">=</span> <span class="kc">NULL</span> <span class="p">)</span> <span class="p">{</span> <span class="nf">UseMethod</span><span class="p">(</span><span class="s">&#34;left_join&#34;</span><span class="p">)</span> <span class="p">}</span> </code></pre></div></li> <li> <p>Standardizing the whitespace around code elements. Have you ever had difficulties deciphering very dense code?</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="m">1+2</span><span class="o">:</span><span class="m">3</span><span class="o">*</span><span class="p">(</span><span class="m">4</span><span class="o">/</span><span class="m">5</span><span class="p">)</span> </code></pre></div><p>Air reformats this expression to:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="m">1</span> <span class="o">+</span> <span class="m">2</span><span class="o">:</span><span class="m">3</span> <span class="o">*</span> <span class="p">(</span><span class="m">4</span> <span class="o">/</span> <span class="m">5</span><span class="p">)</span> </code></pre></div></li> </ul> <h2 id="how-does-a-formatter-improve-your-workflow">How does a formatter improve your workflow? <a href="#how-does-a-formatter-improve-your-workflow"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By using a formatter it might seem like you&rsquo;re giving up control over the layout of your code. And indeed you are! However, putting Air in charge of styling your code has substantial advantages.</p> <p>First, it automatically forces you to write legible code that is neither too wide nor too narrow, with proper breathing room around syntactic elements. Having a formatter as a companion significantly improves the process of writing code as you no longer have to think about style - the formatter does that for you!</p> <p>Second, it reduces friction when working in a team. By agreeing to use a formatter in a project, collaborators no longer have to discuss styling and layout issues. Code sent to you by a colleague will adhere to the standards that you&rsquo;re used to. Code review no longer has to be about style nitpicks and can focus on the substance of the changes instead.</p> <h2 id="how-does-air-decide-how-to-format-your-code">How does Air decide how to format your code? <a href="#how-does-air-decide-how-to-format-your-code"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Air tries to strike a balance between enforcing rigid rules and allowing authors some control over the layout. Our main source of styling rules is the <a href="https://style.tidyverse.org" target="_blank" rel="noopener">Tidyverse style guide</a>, but we occasionally deviate from these.</p> <p>There is a trend among modern formatters of being <em>opinionated</em>. Air certainly fits this trend and provides very few <a href="https://posit-dev.github.io/air/configuration.html" target="_blank" rel="noopener">configuration options</a>, mostly: the indent style (spaces versus tabs), the indent width, and the line width. However, Air also puts code authors in charge of certain aspects of the layout through the notion of <strong>persistent line breaks</strong>.</p> <p>In general, Air is in control of deciding where to put vertical space (line breaks) in your code. For instance if you write:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span><span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span><span class="p">)</span> </code></pre></div><p>Air will figure out that this expression fits on a single line without exceeding the line width. It will discard the line break and reformat to:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span><span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span><span class="p">)</span> </code></pre></div><p>However there are very specific places at which you can insert a line break that Air perceives as persistent:</p> <ul> <li> <p>Before the very first argument in a function call. This:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Persistent line break after `(` and before `bob`</span> <span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span> <span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span><span class="p">)</span> </code></pre></div><p>gets formatted as:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span> <span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span> <span class="p">)</span> </code></pre></div></li> <li> <p>Before the very first right-hand side expression in a pipeline. This:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Persistent line break after `|&gt;` and before `select`</span> <span class="n">data</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">filter</span><span class="p">(</span><span class="o">!</span><span class="n">bar</span><span class="p">)</span> </code></pre></div><p>gets formatted as:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">data</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">filter</span><span class="p">(</span><span class="o">!</span><span class="n">bar</span><span class="p">)</span> </code></pre></div></li> </ul> <p>A persistent line break will never be removed by Air. But you can remove it manually. Taking the last example, if you join the first lines like this:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Removed persistent line break after `(`</span> <span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span><span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span> <span class="p">)</span> <span class="c1"># Removed persistent line break after `|&gt;`</span> <span class="n">data</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">filter</span><span class="p">(</span><span class="o">!</span><span class="n">bar</span><span class="p">)</span> </code></pre></div><p>Air will recognize that you&rsquo;ve removed the persistent line break, and reformat as:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">dictionary</span> <span class="o">&lt;-</span> <span class="nf">list</span><span class="p">(</span><span class="n">bob</span> <span class="o">=</span> <span class="s">&#34;apple&#34;</span><span class="p">,</span> <span class="n">jill</span> <span class="o">=</span> <span class="s">&#34;juice&#34;</span><span class="p">)</span> <span class="n">data</span> <span class="o">|&gt;</span> <span class="nf">select</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">filter</span><span class="p">(</span><span class="o">!</span><span class="n">bar</span><span class="p">)</span> </code></pre></div><p>The goal of this feature is to strike a balance between being opinionated and recognizing that users often know when taking up more vertical space results in more readable output.</p> <h2 id="how-can-i-disable-formatting">How can I disable formatting? <a href="#how-can-i-disable-formatting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you need to disable formatting for a single expression, you can use a <code># fmt: skip</code> comment. This is particularly useful for manual alignment.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># This skips formatting for `list()` and its arguments, retaining the manual alignment</span> <span class="c1"># fmt: skip</span> <span class="nf">list</span><span class="p">(</span> <span class="n">dollar</span> <span class="o">=</span> <span class="s">&#34;USA&#34;</span><span class="p">,</span> <span class="n">yen</span> <span class="o">=</span> <span class="s">&#34;Japan&#34;</span><span class="p">,</span> <span class="n">yuan</span> <span class="o">=</span> <span class="s">&#34;China&#34;</span> <span class="p">)</span> <span class="c1"># This skips formatting for `tribble()` and its arguments</span> <span class="c1"># fmt: skip</span> <span class="nf">tribble</span><span class="p">(</span> <span class="o">~</span><span class="n">x</span><span class="p">,</span> <span class="o">~</span><span class="n">y</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="p">)</span> </code></pre></div><p>If there is a file that Air should skip altogether, you can use a <code># fmt: skip file</code> comment at the very top of the file.</p> <p>To learn more about these features, see the <a href="https://posit-dev.github.io/air/formatter.html#disabling-formatting" target="_blank" rel="noopener">documentation</a>.</p> <h2 id="how-can-i-use-air">How can I use Air? <a href="#how-can-i-use-air"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As we&rsquo;ve touched on above, Air can be integrated into your IDE to format code on every save. We expect that this will be the most common way to invoke Air, but there are a few other ways to use Air that we think are pretty cool:</p> <ul> <li> <p>In IDEs:</p> <ul> <li> <p>Format on save</p> </li> <li> <p>Format selection</p> </li> </ul> </li> <li> <p>At the command line:</p> <ul> <li> <p>Format entire projects with <code>air format .</code></p> </li> <li> <p>Set up a git precommit hook to invoke Air before committing</p> </li> </ul> </li> <li> <p>In CI:</p> <ul> <li> <p>Use a GitHub Action to check that each PR conforms to formatting standards with <code>air format . --check</code><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p> </li> <li> <p>Use a GitHub Action to automatically format each PR by pushing the results of <code>air format</code> as a commit</p> </li> </ul> </li> </ul> <p>We don&rsquo;t have guides for all of these use cases yet, but the best place to stay up to date is the <a href="https://posit-dev.github.io/air/" target="_blank" rel="noopener">Air website</a>.</p> <h2 id="how-is-this-different-from-styler">How is this different from styler? <a href="#how-is-this-different-from-styler"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Air would not exist without the preexisting work and dedication poured into <a href="https://github.com/r-lib/styler" target="_blank" rel="noopener">styler</a>. Created by <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">Lorenz Walthert</a> and <a href="https://github.com/krlmlr" target="_blank" rel="noopener">Kirill Müller</a>, styler proved that the R community does care about how their code is formatted, and has been the primary implementation of the <a href="https://style.tidyverse.org/" target="_blank" rel="noopener">tidyverse style guide</a> for many years. We&rsquo;ve spoken to Lorenz about Air, and we are all very excited about what Air can do for the future of formatting in R.</p> <p>Air is different from styler in a few key ways:</p> <ul> <li> <p>Air is much faster. So much so that it enables new ways of using a formatter that were somewhat painful before, like formatting on every save, or formatting entire projects on every pull request.</p> </li> <li> <p>Air is less configurable. As mentioned above, Air provides very few <a href="https://posit-dev.github.io/air/configuration.html" target="_blank" rel="noopener">configuration options</a>.</p> </li> <li> <p>Air respects a line width, with a default of 80 characters.</p> </li> <li> <p>Air does not require R to run. Unlike styler, which is an R package, Air is written in Rust and is distributed as a pre-compiled binary for many platforms. This makes Air easy to use across IDEs or on CI with very little setup required.</p> </li> </ul> <h2 id="how-fast-is-extremely-fast">How fast is &ldquo;extremely fast&rdquo;? <a href="#how-fast-is-extremely-fast"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Air is written in Rust using the formatting infrastructure provided by <a href="https://github.com/biomejs/biome" target="_blank" rel="noopener">Biome</a><sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>. This is also the same infrastructure that <a href="https://github.com/astral-sh/ruff" target="_blank" rel="noopener">Ruff</a>, the fast Python formatter, originally forked from. Both of those projects are admired for their performance, and Air is no exception.</p> <p>One goal for Air is for &ldquo;format on save&rdquo; to be imperceptibly fast, encouraging you to keep it on at all times. Benchmarking formatters is a bit hand wavy due to some having built in caching, so bear with me, but one way to proxy this performance is by formatting a large single file, for example the 800+ line <a href="https://github.com/tidyverse/dplyr/blob/main/R/join.R" target="_blank" rel="noopener">join.R</a> in dplyr. Formatting this takes<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>:</p> <ul> <li> <p>0.01 seconds with Air</p> </li> <li> <p>1 second with styler (no cache)</p> </li> </ul> <p>So, ~100x faster for Air! If you make a few changes in the file after the first round of formatting and run the formatter again, then you get something like:</p> <ul> <li> <p>0.01 seconds with Air</p> </li> <li> <p>0.5 seconds with styler (with cache)</p> </li> </ul> <p>Half a second for styler might not sound that bad (and indeed, for a formatter written in R it&rsquo;s pretty good), but it&rsquo;s slow enough that you&rsquo;ll &ldquo;feel&rdquo; it if you try and invoke styler on every save. But 0.01 seconds? You&rsquo;ll never even know its running!</p> <p>The differences get even more drastic if you format entire projects. Formatting the ~150 R files in dplyr takes<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>:</p> <ul> <li> <p>0.3 seconds with Air</p> </li> <li> <p>100 seconds with styler</p> </li> </ul> <p>Over 300x faster!</p> <p>Out of curiosity, we also ran Air over all ~900 R files in base R and it finished in under 2 seconds.</p> <h2 id="wrapping-up">Wrapping up <a href="#wrapping-up"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By contributing this formatter to the R community, our objective is threefold:</p> <ul> <li> <p>Vastly improve your enjoyment of writing well-styled R code by removing the chore of editing whitespace.</p> </li> <li> <p>Reduce friction in collaborative projects by establishing a consistent style once and for all.</p> </li> <li> <p>Improve the overall readability of R code for the community.</p> </li> </ul> <p>We hope that Air will prove to be a valuable companion in your daily workflow!</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>The Shiny team already has a <a href="https://github.com/rstudio/shiny-workflows/tree/main/format-r-code" target="_blank" rel="noopener">GitHub Action</a> to help with this. We will likely work on refining this and incorporating it more officially into an Air or r-lib repository. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Biome is an open source project maintained by community members, please consider <a href="https://github.com/sponsors/biomejs#sponsors" target="_blank" rel="noopener">sponsoring them</a>! <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>These benchmarks were run with <code>air format R/join.R</code> and <code>styler::style_file(&quot;R/join.R&quot;)</code>. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>With <code>air format .</code> and <a href="https://styler.r-lib.org/reference/style_pkg.html" target="_blank" rel="noopener"><code>styler::style_pkg()</code></a> <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Three experiments in LLM code assist with RStudio and Positron https://www.tidyverse.org/blog/2025/01/experiments-llm/ Wed, 29 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/experiments-llm/ <p>The last few months, I&rsquo;ve been exploring how AI/LLMs might make my time developing R packages and doing data science more productive. This post will describe three experimental R packages&mdash; <a href="https://simonpcouch.github.io/pal/" target="_blank" rel="noopener">pal</a>, <a href="https://simonpcouch.github.io/ensure/" target="_blank" rel="noopener">ensure</a>, and <a href="https://simonpcouch.github.io/gander/" target="_blank" rel="noopener">gander</a>&mdash;that came out of that exploration, and the core tools underlying them. Taken together, I&rsquo;ve found that these packages allow me to automate many of the less interesting parts of my work, turning all sorts of 45-second tasks into 5-second ones. Excitement from folks in the community has been very encouraging so far, and I&rsquo;m looking forward to getting each of these packages buttoned up and sent off to CRAN in the coming weeks!</p> <h2 id="background">Background <a href="#background"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Twice a year, the tidyverse team sets a week aside for &ldquo;spring cleaning,&rdquo; bringing all of our R packages up to snuff with the most current tooling and standardizing various bits of our development process. Some of these updates can happen by calling a single function, while others are much more involved. One of those more involved updates is updating erroring code, transitioning away from base R (e.g.  <a href="https://rdrr.io/r/base/stop.html" target="_blank" rel="noopener"><code>stop()</code></a>), rlang (e.g.  <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>rlang::abort()</code></a>), <a href="https://glue.tidyverse.org/" target="_blank" rel="noopener">glue</a>, and homegrown combinations of them. cli&rsquo;s new syntax is easier to work with as a developer and more visually pleasing as a user.</p> <p>In some cases, transitioning is almost as simple as Finding + Replacing <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>rlang::abort()</code></a> to <a href="https://cli.r-lib.org/reference/cli_abort.html" target="_blank" rel="noopener"><code>cli::cli_abort()</code></a>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># before:</span> <span class="n">rlang</span><span class="o">::</span><span class="nf">abort</span><span class="p">(</span><span class="s">&#34;`save_pred` can only be used if the initial results saved predictions.&#34;</span><span class="p">)</span> <span class="c1"># after: </span> <span class="n">cli</span><span class="o">::</span><span class="nf">cli_abort</span><span class="p">(</span><span class="s">&#34;{.arg save_pred} can only be used if the initial results saved predictions.&#34;</span><span class="p">)</span> </code></pre></div><p>In others, there&rsquo;s a mess of ad-hoc pluralization, <a href="https://rdrr.io/r/base/paste.html" target="_blank" rel="noopener"><code>paste0()</code></a>s, glue interpolations, and other assorted nonsense to sort through:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># before:</span> <span class="n">extra_grid_params</span> <span class="o">&lt;-</span> <span class="n">glue</span><span class="o">::</span><span class="nf">single_quote</span><span class="p">(</span><span class="n">extra_grid_params</span><span class="p">)</span> <span class="n">extra_grid_params</span> <span class="o">&lt;-</span> <span class="n">glue</span><span class="o">::</span><span class="nf">glue_collapse</span><span class="p">(</span><span class="n">extra_grid_params</span><span class="p">,</span> <span class="n">sep</span> <span class="o">=</span> <span class="s">&#34;, &#34;</span><span class="p">)</span> <span class="n">msg</span> <span class="o">&lt;-</span> <span class="n">glue</span><span class="o">::</span><span class="nf">glue</span><span class="p">(</span> <span class="s">&#34;The provided `grid` has the following parameter columns that have &#34;</span><span class="p">,</span> <span class="s">&#34;not been marked for tuning by `tune()`: {extra_grid_params}.&#34;</span> <span class="p">)</span> <span class="n">rlang</span><span class="o">::</span><span class="nf">abort</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span> <span class="c1"># after:</span> <span class="n">cli</span><span class="o">::</span><span class="nf">cli_abort</span><span class="p">(</span> <span class="s">&#34;The provided {.arg grid} has parameter columns that have not been </span><span class="s"> marked for tuning by {.fn tune}: {.val {extra_grid_params}}.&#34;</span> <span class="p">)</span> </code></pre></div><p>Total pain, especially with thousands upon thousands of error messages thrown across the tidyverse, r-lib, and tidymodels organizations.</p> <p>The week before our most recent spring cleaning, I participated in an internal Posit LLM hackathon, where a small group of employees would familiarize with interfacing with LLMs via APIs and then set aside a day or two to build something to make their work easier. Heading into our spring cleaning and dreading the task of updating thousands of these calls, I decided to look into how effectively LLMs could help me convert this code. Thus was born <a href="https://github.com/simonpcouch/clipal" target="_blank" rel="noopener">clipal</a><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, a (now-superseded) R package that allows users to select erroring code, press a keyboard shortcut, wait a moment, and watch the updated code be inlined in to the selection.</p> <div class="highlight"> <p><img src="figs/clipal.gif" alt="A screencast of an RStudio session with an R file open in the source editor. 9 lines of ad-hoc erroring code are selected and, after a brief pause, replace with one call to [`cli::cli_abort()`](https://cli.r-lib.org/reference/cli_abort.html)." width="700px" style="display: block; margin: auto;" /></p> </div> <p>clipal was a <em>huge</em> boost for us in the most recent spring cleaning. Depending on the code being updated, these erroring calls used to take 30 seconds to a few minutes. With clipal, though, the model could usually get the updated code 80% or 90% of the way there in a couple seconds. Up to this point, irritated by autocomplete and frustrated by the friction of copying and pasting code and typing out the same bits of context into chats again and again, I had been relatively skeptical that LLMs could make me more productive. After using clipal for a week, though, I began to understand how seamlessly LLMs could automate the cumbersome and uninteresting parts of my work.</p> <p>clipal itself is now superseded by pal, a more general solution to the problem that clipal solved. I&rsquo;ve also written two additional packages like pal that solve two other classes of pal-like problems using similar tools, ensure and gander. In this post, I&rsquo;ll write a bit about how I&rsquo;ve used a pair of tools in three experiments that have made me much more productive as an R developer.</p> <h2 id="prerequisites-ellmer-and-the-rstudio-api">Prerequisites: ellmer and the RStudio API <a href="#prerequisites-ellmer-and-the-rstudio-api"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While clipal is now superseded, the package that supersedes it and its other two descendants makes use of the same two tools: <a href="https://github.com/tidyverse/ellmer" target="_blank" rel="noopener">ellmer</a> and the <a href="https://rstudio.github.io/rstudioapi/" target="_blank" rel="noopener">RStudio API</a>.</p> <p>Last year, Hadley Wickham and Joe Cheng began work on ellmer, a package that aims to make it easy to use large language models in R. For folks that have tried to use LLM APIs through HTTP requests, or interfaced with existing tools that wrap them like langchain, ellmer is pretty incredible. R users can initialize a Chat object using a predictably named function:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">ellmer</span><span class="p">)</span> <span class="c1"># to use a model like GPT-4o or GPT-4o-mini from OpenAI:</span> <span class="n">ch</span> <span class="o">&lt;-</span> <span class="nf">chat_openai</span><span class="p">()</span> <span class="c1"># ...or a locally hosted ollama model:</span> <span class="n">ch</span> <span class="o">&lt;-</span> <span class="nf">chat_ollama</span><span class="p">()</span> <span class="c1"># ...or Claude&#39;s Sonnet model:</span> <span class="n">ch</span> <span class="o">&lt;-</span> <span class="nf">chat_claude</span><span class="p">()</span> </code></pre></div><p>Then calling the output&rsquo;s <code>$chat()</code> method returns a character response:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">ch</span><span class="o">$</span><span class="nf">chat</span><span class="p">(</span><span class="s">&#34;When was R created? Be brief.&#34;</span><span class="p">)</span> <span class="c1">#&gt; R was created in 1993 by Ross Ihaka and Robert Gentleman at </span> <span class="c1">#&gt; the University of Auckland, New Zealand.</span> </code></pre></div><p>There&rsquo;s a whole lot more to ellmer, but this functionality alone was enough to make clipal happen. I could allow users to choose a Chat from whatever provider they prefer to power the addin and ellmer would take care of all of the details underneath the hood.</p> <p>The other puzzle piece here was how to get that character vector directly into the file so that the user didn&rsquo;t have to copy and paste code from a chat interface into their document. The RStudio IDE supplies an API to interface with various bits of the RStudio UI through R code via the rstudioapi package. Notably, through R code, the package can read what&rsquo;s inside of the user&rsquo;s active selection and also write character vectors into that range. clipal could thus:</p> <ul> <li>When triggered, read what&rsquo;s inside of the selection using rstudioapi.</li> <li>Pass that selection contents to an LLM along with a system prompt describing how to convert R erroring code to use cli using ellmer. (If you&rsquo;re curious, the current draft of that prompt is <a href="https://github.com/simonpcouch/pal/blob/1cd81736ee11cfaea1fd2466025dffcbdb845c3c/inst/prompts/cli-replace.md" target="_blank" rel="noopener">here</a>.)</li> <li>When the response is returned, replace the contents of the selection with the response using cli.</li> </ul> <p>This approach of using ellmer and the rstudioapi has its ups and downs. As for the advantages:</p> <ul> <li>Our <a href="https://positron.posit.co/" target="_blank" rel="noopener">Positron IDE</a> has &ldquo;shims&rdquo; of the RStudio API, so whatever works in RStudio will also work in Positron. This means that the same shortcuts can be mapped to the same tool in either IDE and it will just work without me, as the developer, having to do anything.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></li> <li>Since these packages are written in R, they have access to your R environment. This is quite the differentiator compared to the more language-agnostic tools out there&mdash;these packages can see the data frames you have loaded, the columns and column types in them, etc. When working with other tools for LLM code-assist that don&rsquo;t have this information, the friction of printing out variable information from my R environment and pasting it into whatever interface is so high that I don&rsquo;t even ask LLMs for help with tasks they&rsquo;re otherwise totally capable of.</li> <li>Using ellmer under the hood means that, once R users have set up model connections with ellmer, they can use the same configuration with any of these packages with minimal additional effort. So, clipal and the packages that followed it support whatever model providers their users want to use&mdash;OpenAI, Claude, local ollama models, and so on. If you can use it with ellmer, you can use it with these packages.</li> </ul> <p>As for the disadvantages, there are all sorts of UI bummers about this approach. Above all, these packages write directly to your files. This is great in that it removes the need to copy and paste, and when the model&rsquo;s response is spot on, it&rsquo;s awesome. At the same time, if the model starts rambling in an <code>.R</code> file or you want to confirm some difference between your previous code and the new code, the fact that these packages just write right into your files can be a bit annoying. Many other inline LLM code-assist tools out there are based on diffs&mdash;they show you proposed changes and some UI element that allows you to accept them, reject them, or ask for revisions. This requires one more step between asking for an LLM to do something and the thing actually being done, but saves the pain of lots of undoing or manually retrieving what code used to look like to verify the model&rsquo;s work.</p> <h2 id="pal">pal <a href="#pal"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><img src="https://github.com/simonpcouch/pal/blob/main/inst/figs/logo.png?raw=true" align="right" height="240" alt="The package hex, a yellow blob happily holding a checklist amid a purple background."/> <p>After using clipal during our spring cleaning, I approached another spring cleaning task for the week: updating testing code. testthat 3.0.0 was released in 2020, bringing with it numerous changes that were both huge quality of life improvements for package developers and also highly breaking changes. While some of the task of converting legacy unit testing code to testthat 3e is relatively straightforward, other components can be quite tedious. Could I do the same thing for updating to testthat 3e that I did for transitioning to cli? I sloppily threw together a sister package to clipal that would convert tests for errors to snapshot tests, disentangle nested expectations, and transition from deprecated functions like <code>⁠expect_known_*()</code>. ⁠(If you&rsquo;re interested, the current prompt for that functionality is <a href="https://github.com/simonpcouch/pal/blob/1cd81736ee11cfaea1fd2466025dffcbdb845c3c/inst/prompts/testthat-replace.md" target="_blank" rel="noopener">here</a>.) That sister package was also a huge boost for me, but the package reused as-is almost every piece of code from clipal other than the prompt. Thus, I realized that the proper solution would provide all of this scaffolding to attach a prompt to a keyboard shortcut, but allow for an arbitrary set of prompts to help automate these wonky, cumbersome tasks.</p> <p>The next week, <a href="https://simonpcouch.github.io/pal/" target="_blank" rel="noopener">pal</a> was born. The pal package ships with three prompts centered on package development: the cli pal and testthat pal mentioned previously, as well as the roxygen pal, which drafts minimal roxygen documentation based on a function definition. Here&rsquo;s what pal&rsquo;s interface looks like now:</p> <div class="highlight"> <p><img src="figs/pal.gif" alt="Another RStudio screencast. This time, a 12-line function definition is iteratively revised as the user selects lines of code and selects an entry in a dropdown menu, after which a model streams new code in place. In addition to converting erroring code, the model also drafts roxygen documentation for a function." width="100%" style="display: block; margin: auto;" /></p> </div> <p>Users can add custom prompts for whatever tasks they please and they&rsquo;ll be included in the searchable dropdown shown above.</p> <p>I&rsquo;ve been super appreciative of all of the love the package has received already, and I&rsquo;ll be sending the package out to CRAN in the coming weeks.</p> <h2 id="ensure">ensure <a href="#ensure"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While deciding on the initial set of prompts that pal would include, I really wanted to include some sort of &ldquo;write unit tests for this function&rdquo; pal. To really address this problem, though, requires violating two of pal&rsquo;s core assumptions:</p> <ul> <li><em>All of the context that you need is in the selection and the prompt.</em> In the case of writing unit tests, it&rsquo;s actually pretty important to have other pieces of context. If a package provides some object type <code>potato</code>, in order to write tests for some function that takes <code>potato</code> as input, it&rsquo;s likely very important to know how potatoes are created and the kinds of properties they have. pal&rsquo;s sister package for writing unit tests, ensure, can thus &ldquo;see&rdquo; the rest of the file that you&rsquo;re working on, as well as context from neighboring files like other <code>.R</code> source files, the corresponding test file, and package vignettes, to learn about how to interface with the function arguments being tested.</li> <li><em>The LLM&rsquo;s response can prefix, replace, or suffix the active selection in the same file.</em> In the case of writing unit tests for R, the place that tests actually ought to go is in a corresponding test file in <code>tests/testthat/</code>. Via the RStudio API, ensure can open up the corresponding test file and write to it rather than the source file where it was triggered from.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></li> </ul> <div class="highlight"> <p><img src="figs/ensure.gif" alt="Another RStudio screencast. This time, the user selects around 20 lines of code situated in an R package and, after pressing a key command, the addin opens a corresponding test file and begins streaming unit testing code into the file. After the model completes streaming, the user runs the testing code and all tests pass." width="100%" style="display: block; margin: auto;" /></p> </div> <p>So far, I haven&rsquo;t spent as much time with ensure as I have with pal or gander, but I&rsquo;ll be revisiting the package and sending it off to CRAN in the coming weeks.</p> <h2 id="gander">gander <a href="#gander"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><a href="https://simonpcouch.github.io/gander/"><img src="https://github.com/simonpcouch/gander/blob/main/inst/figs/gander.png?raw=true" align="right" height="240" alt="The package hex, a goose hanging out amid a green background." /></a></p> <p>pal really excels at things you do all the time. Providing custom prompts with lots of details about code syntax and your taste means that models will often provide code that&rsquo;s almost exactly what you&rsquo;d write yourself. On its own, though, pal is incomplete as a toolkit for LLM code-assist. What about one-off requests that are specific to the environment that I&rsquo;m working in or things I only do every once in a long while? It&rsquo;s nice to have a much more general tool that functions much more like a chat interface.</p> <p>At the same time, working with usual chat interfaces is quite high-friction, so much so that you&rsquo;ll likely spend more time pasting in context from your files and R environmet than you would if you had just written the code yourself. There are all sorts of language-agnostic interfaces (or language-specific but not for R or RStudio) tools out there implementing this. You type some request with your cursor near some code, and then, in the backend, the tool assembles a bunch of context that will help the model respond more effectively. This is super helpful for many software engineering contexts, where most all of the context you need can be found in the contents of files. Data science differs a bit from software engineering here, though, in that the state of your R environment is just as important (or more so) than the contents of your files. For example, the lines of your files may show that you reference some data frame called <code>stackoverflow</code>, but what will <em>really</em> help a model write R code to interface with that data frame is &ldquo;seeing&rdquo; it: what columns are in it, and what are their types and distributions? gander is a chat interface that allows models to see the data you&rsquo;re working with.</p> <div class="highlight"> <p><img src="figs/gander.gif" alt="Another RStudio screencast. A script called example.R is open in the editor with lines library(ggplot2), data(stackoverflow), and stackoverflow. After highlighting the last line, the user triggers the addin and ask to plot the data in plain language, at which point code to plot the data using ggplot2 is streamed into the source file that uses the correct column names and a minimal style. The user iteratively calls the addin to refine the output." width="100%" style="display: block; margin: auto;" /></p> </div> <p>Behind the scenes, gander combines your selection (or lack thereof), inputted request, file type and contents, and R environment to dynamically assemble prompts to best enable models to tailor their responses to your R session. I use gander several times every day to turn 45-second tasks into 5-second ones and have been super stoked with how well-received it&rsquo;s been among R folks so far. Compared to pal and ensure, this package feels like a much more substantial lift for data scientists specifically (rather than package developers). In the coming weeks, I&rsquo;ll sand down some of its rough edges and send it off to CRAN.</p> <h2 id="whats-next">What&rsquo;s next? <a href="#whats-next"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For now, all of these packages only live on my GitHub profile. In the coming weeks, I plan to revisit each of them, squash a bunch of bugs, and send them off to CRAN.</p> <p>That said, these packages are very much experimental. The user interface of writing directly to users&rsquo; files very much limits how useful these tools can be, and I think that the kinds of improvements to interface I&rsquo;m hoping for may only be possible via some backend other than the RStudio API. I&rsquo;m looking forward to seeing what that could look like.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>Pronounced &ldquo;c-l-i pal.&rdquo; <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>In reality, there are bugs and differences here and there, but the development effort to get these packages working in Positron was relatively minimal. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>This is one gap between the RStudio API and Positron&rsquo;s shims for it. The Positron shims currently don&rsquo;t allow for toggling between files, so ensure isn&rsquo;t available in Positron. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> nanoparquet 0.4.0 https://www.tidyverse.org/blog/2025/01/nanoparquet-0-4-0/ Tue, 28 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/nanoparquet-0-4-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://nanoparquet.r-lib.org/" target="_blank" rel="noopener">nanoparquet</a> 0.4.0. nanoparquet is an R package that reads and writes Parquet files.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;nanoparquet&#34;</span><span class="p">)</span> </code></pre></div><p>This blog post will show the most important new features of nanoparquet 0.4.0: You can see a full list of changes in the <a href="https://nanoparquet.r-lib.org/news/index.html#nanoparquet-040" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="brand-new-read_parquet">Brand new <code>read_parquet()</code> <a href="#brand-new-read_parquet"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>nanoparquet 0.4.0 comes with a completely rewritten Parquet reader. The new version has an architecture that is easier to embed into R, and facilitates fantastic new features, like <a href="https://nanoparquet.r-lib.org/reference/append_parquet.html" target="_blank" rel="noopener"><code>append_parquet()</code></a> and the new <code>col_select</code> argument. (More to come!) The new reader is also much faster, see the &ldquo;Benchmarks&rdquo; chapter.</p> <h2 id="read-a-subset-of-columns">Read a subset of columns <a href="#read-a-subset-of-columns"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://nanoparquet.r-lib.org/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> now has a new argument called <code>col_select</code>, that lets you read a subset of the columns from the Parquet file. Unlike for row oriented file formats like CSV, this means that the reader never needs to touch the columns that are not needed for. The time required for reading a subset of columns is independent of how many more columns the Parquet file might have!</p> <p>You can either use column indices or column names to specify the columns to read. Here is an example.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/r-lib/nanoparquet'>nanoparquet</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://pillar.r-lib.org/'>pillar</a></span><span class='o'>)</span></span></code></pre> </div> <p>This is the <a href="https://rdrr.io/pkg/nycflights13/man/flights.html" target="_blank" rel="noopener"><code>nycflights13::flights</code></a> data set:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/read_parquet.html'>read_parquet</a></span><span class='o'>(</span></span> <span> <span class='s'>"flights.parquet"</span>,</span> <span> col_select <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"dep_time"</span>, <span class='s'>"arr_time"</span>, <span class='s'>"carrier"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 336,776 × 3</span></span></span> <span><span class='c'>#&gt; dep_time arr_time carrier</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 517 830 UA </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 533 850 UA </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 542 923 AA </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 544 <span style='text-decoration: underline;'>1</span>004 B6 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 554 812 DL </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 554 740 UA </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 555 913 B6 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 557 709 EV </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 557 838 B6 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 558 753 AA </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 336,766 more rows</span></span></span> <span></span></code></pre> </div> <p>Use <a href="https://nanoparquet.r-lib.org/reference/read_parquet_schema.html" target="_blank" rel="noopener"><code>read_parquet_schema()</code></a> if you want to see the structure of the Parquet file first:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/read_parquet_schema.html'>read_parquet_schema</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 20 × 12</span></span></span> <span><span class='c'>#&gt; file_name name r_type type type_length repetition_type converted_type</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> flights.parquet sche… <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> flights.parquet year integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> flights.parquet month integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> flights.parquet day integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> flights.parquet dep_… integ… INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> flights.parquet sche… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> flights.parquet dep_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> flights.parquet arr_… integ… INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> flights.parquet sche… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> flights.parquet arr_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> flights.parquet carr… chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> flights.parquet flig… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> flights.parquet tail… chara… BYTE… <span style='color: #BB0000;'>NA</span> OPTIONAL UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> flights.parquet orig… chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> flights.parquet dest chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> flights.parquet air_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> flights.parquet dist… double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> flights.parquet hour double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> flights.parquet minu… double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> flights.parquet time… POSIX… INT64 <span style='color: #BB0000;'>NA</span> REQUIRED TIMESTAMP_MIC…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 5 more variables: logical_type &lt;I&lt;list&gt;&gt;, num_children &lt;int&gt;, scale &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># precision &lt;int&gt;, field_id &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>The output of <a href="https://nanoparquet.r-lib.org/reference/read_parquet_schema.html" target="_blank" rel="noopener"><code>read_parquet_schema()</code></a> also shows you the R type that nanoparquet will use for each column.</p> <h2 id="appending-to-parquet-files">Appending to Parquet files <a href="#appending-to-parquet-files"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The new <a href="https://nanoparquet.r-lib.org/reference/append_parquet.html" target="_blank" rel="noopener"><code>append_parquet()</code></a> function makes it easy to append new data to a Parquet file, without first reading the whole file into memory. The schema of the file and the schema new data must match of course. Lets merge <a href="https://rdrr.io/pkg/nycflights13/man/flights.html" target="_blank" rel="noopener"><code>nycflights13::flights</code></a> and <a href="https://moderndive.github.io/nycflights23/reference/flights.html" target="_blank" rel="noopener"><code>nycflights23::flights</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/files.html'>file.copy</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span>, <span class='s'>"allflights.parquet"</span>, overwrite <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/append_parquet.html'>append_parquet</a></span><span class='o'>(</span><span class='nf'>nycflights23</span><span class='nf'>::</span><span class='nv'><a href='https://moderndive.github.io/nycflights23/reference/flights.html'>flights</a></span>, <span class='s'>"allflights.parquet"</span><span class='o'>)</span></span></code></pre> </div> <p> <a href="https://nanoparquet.r-lib.org/reference/read_parquet_info.html" target="_blank" rel="noopener"><code>read_parquet_info()</code></a> returns the most basic information about a Parquet file:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/read_parquet_info.html'>read_parquet_info</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1 × 7</span></span></span> <span><span class='c'>#&gt; file_name num_cols num_rows num_row_groups file_size parquet_version</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> flights.parquet 19 <span style='text-decoration: underline;'>336</span>776 1 5<span style='text-decoration: underline;'>687</span>737 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: created_by &lt;chr&gt;</span></span></span> <span></span><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/read_parquet_info.html'>read_parquet_info</a></span><span class='o'>(</span><span class='s'>"allflights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1 × 7</span></span></span> <span><span class='c'>#&gt; file_name num_cols num_rows num_row_groups file_size parquet_version</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> allflights.parquet 19 <span style='text-decoration: underline;'>772</span>128 1 13<span style='text-decoration: underline;'>490</span>997 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: created_by &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <p>Note that you should probably still create a backup copy of the original file when using <a href="https://nanoparquet.r-lib.org/reference/append_parquet.html" target="_blank" rel="noopener"><code>append_parquet()</code></a>. If the appending process is interrupted by a power failure, then you might end up with an incomplete and invalid Parquet file.</p> <h2 id="schemas-and-type-conversions">Schemas and type conversions <a href="#schemas-and-type-conversions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In nanoparquet 0.4.0 <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> takes a <code>schema</code> argument that can customize the R to Parquet type mappings. For example by default <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> writes an R character vector as a <code>STRING</code> Parquet type. If you&rsquo;d like to write a certain character column as an <code>ENUM</code> type<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> instead, you&rsquo;ll need to specify that in <code>schema</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/write_parquet.html'>write_parquet</a></span><span class='o'>(</span></span> <span> <span class='nf'>nycflights13</span><span class='nf'>::</span><span class='nv'><a href='https://rdrr.io/pkg/nycflights13/man/flights.html'>flights</a></span>,</span> <span> <span class='s'>"newflights.parquet"</span>,</span> <span> schema <span class='o'>=</span> <span class='nf'><a href='https://nanoparquet.r-lib.org/reference/parquet_schema.html'>parquet_schema</a></span><span class='o'>(</span>carrier <span class='o'>=</span> <span class='s'>"ENUM"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/read_parquet_schema.html'>read_parquet_schema</a></span><span class='o'>(</span><span class='s'>"newflights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 20 × 12</span></span></span> <span><span class='c'>#&gt; file_name name r_type type type_length repetition_type converted_type</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> newflights.par… sche… <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> newflights.par… year integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> newflights.par… month integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> newflights.par… day integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> newflights.par… dep_… integ… INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> newflights.par… sche… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> newflights.par… dep_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> newflights.par… arr_… integ… INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> newflights.par… sche… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> newflights.par… arr_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> newflights.par… carr… chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED ENUM </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> newflights.par… flig… integ… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> newflights.par… tail… chara… BYTE… <span style='color: #BB0000;'>NA</span> OPTIONAL UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> newflights.par… orig… chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> newflights.par… dest chara… BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> newflights.par… air_… double DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> newflights.par… dist… double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> newflights.par… hour double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> newflights.par… minu… double DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> newflights.par… time… POSIX… INT64 <span style='color: #BB0000;'>NA</span> REQUIRED TIMESTAMP_MIC…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 5 more variables: logical_type &lt;I&lt;list&gt;&gt;, num_children &lt;int&gt;, scale &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># precision &lt;int&gt;, field_id &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>Here we wrote the <code>carrier</code> column as <code>ENUM</code>, and left the other other columns to use the default type mappings.</p> <p>See the <a href="https://nanoparquet.r-lib.org/reference/nanoparquet-types.html#r-s-data-types" target="_blank" rel="noopener"><code>?nanoparquet-types</code></a> manual page for the possible type mappings (lots of new ones!) and also for the default ones.</p> <h2 id="encodings">Encodings <a href="#encodings"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>It is now also possible to customize the encoding of each column in <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a>, via the <code>encoding</code> argument. By default <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> tries to choose a good encoding based on the type and the values of a column. E.g. it checks a small sample for repeated values to decide if it is worth using a dictionary encoding (<code>RLE_DICTIONARY</code>).</p> <p>If <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> gets it wrong, use the <code>encoding</code> argument to force an encoding. The following forces the <code>PLAIN</code> encoding for all columns. This encoding is very fast to write, but creates a larger file. You can also specify different encodings for different columns, see the <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code> manual page</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://nanoparquet.r-lib.org/reference/write_parquet.html'>write_parquet</a></span><span class='o'>(</span></span> <span> <span class='nf'>nycflights13</span><span class='nf'>::</span><span class='nv'><a href='https://rdrr.io/pkg/nycflights13/man/flights.html'>flights</a></span>,</span> <span> <span class='s'>"plainflights.parquet"</span>,</span> <span> encoding <span class='o'>=</span> <span class='s'>"PLAIN"</span></span> <span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/file.info.html'>file.size</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 5687737</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/file.info.html'>file.size</a></span><span class='o'>(</span><span class='s'>"plainflights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 11954574</span></span> <span></span></code></pre> </div> <p>See more about the implemented encodings and how the defaults are selected in the <a href="https://nanoparquet.r-lib.org/reference/parquet-encodings.html" target="_blank" rel="noopener"><code>parquet-encodings</code> manual page</a>.</p> <h2 id="api-changes">API changes <a href="#api-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Some nanoparquet functions have new, better names in nanoparquet 0.4.0. In particular, all functions that read from a Parquet file have a <code>read_parquet</code> prefix now. The old functions still work, with a warning.</p> <p>Also, the <a href="https://nanoparquet.r-lib.org/reference/parquet_schema.html" target="_blank" rel="noopener"><code>parquet_schema()</code></a> function is now for creating a new Parquet schema from scratch, and not for inferring a schema from a data frame (use <a href="https://nanoparquet.r-lib.org/reference/infer_parquet_schema.html" target="_blank" rel="noopener"><code>infer_parquet_schema()</code></a>) or for reading the schema from a Parquet file (use <a href="https://nanoparquet.r-lib.org/reference/read_parquet_schema.html" target="_blank" rel="noopener"><code>read_parquet_schema()</code></a>). <a href="https://nanoparquet.r-lib.org/reference/parquet_schema.html" target="_blank" rel="noopener"><code>parquet_schema()</code></a> falls back to the old behaviour when called with a file name, with a warning, so this is not a breaking change (yet), and old code still works.</p> <p>See the complete list of API changes in the <a href="https://nanoparquet.r-lib.org/news/index.html" target="_blank" rel="noopener">Changelog</a>.</p> <h2 id="benchmarks">Benchmarks <a href="#benchmarks"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We are very excited about the performance of the new Parquet reader, and the Parquet writer was always quite speedy, so we ran a simple benchmark.</p> <p>We compared nanoparquet to the Parquet implementations in Apache Arrow and DuckDB, and also to CSV readers and writers, on a real data set, for samples of 330k, 6.7 million and 67.4 million rows (40MB, 800MB and 8GB in memory). For these data nanoparquet is indeed very competitive with both Arrow and DuckDB.</p> <p>You can see the full results <a href="https://nanoparquet.r-lib.org/articles/benchmarks.html" target="_blank" rel="noopener">on the website</a>.</p> <h2 id="other-changes">Other changes <a href="#other-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Other important changes in nanoparquet 0.4.0 include:</p> <ul> <li> <p> <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> can now write multiple row groups. By default it puts at most 10 million rows in a single row group. (This is subject to <a href="https://nanoparquet.r-lib.org/references/parquet_options.html">https://nanoparquet.r-lib.org/references/parquet_options.html</a> ) on how to change the default.</p> </li> <li> <p> <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> now writes minimum and maximum statistics (by default) for most Parquet types. See the <a href="https://nanoparquet.r-lib.org/reference/parquet_options.html" target="_blank" rel="noopener"><code>parquet_options()</code> manual page</a> on how to turn this off, which will probably make the writer faster.</p> </li> <li> <p> <a href="https://nanoparquet.r-lib.org/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> can now write version 2 data pages. The default is still version 1, but it might change in the future.</p> </li> <li> <p>New <code>compression_level</code> option to select the compression level manually.</p> </li> <li> <p> <a href="https://nanoparquet.r-lib.org/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> can now read from an R connection.</p> </li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://github.com/alvarocombo" target="_blank" rel="noopener">@alvarocombo</a>, <a href="https://github.com/D3SL" target="_blank" rel="noopener">@D3SL</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, and <a href="https://github.com/RealTYPICAL" target="_blank" rel="noopener">@RealTYPICAL</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>A Parquet <code>ENUM</code> type is very similar to a factor in R. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> httr2 1.1.0 https://www.tidyverse.org/blog/2025/01/httr2-1-1-0/ Mon, 20 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/httr2-1-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="https://httr2.r-lib.org" target="_blank" rel="noopener">httr2 1.1.0</a>. httr2 (pronounced &ldquo;hitter2&rdquo;) is a comprehensive HTTP client that provides a modern, pipeable API for working with web APIs. It builds on top of <a href="https://jeroen.r-universe.dev/curl" target="_blank" rel="noopener">{curl}</a> to provide features like explicit request objects, built-in rate limiting &amp; retry tooling, comprehensive OAuth support, and secure handling of secrets and credentials.</p> <p>In this blog post, we&rsquo;ll dive into the new streaming interface built around <a href="https://httr2.r-lib.org/reference/req_perform_connection.html" target="_blank" rel="noopener"><code>req_perform_connection()</code></a>, explore the new suite of URL manipulation tools, and highlight a few of the other biggest changes (including better support for AWS and enhancements to the caching system), and update you on the lifecycle changes.</p> <p>This blog post includes the most important enhacenments in versions 1.0.1 through 1.0.7, where we&rsquo;ve been iterating on various features and fixing <em>numerous</em> bugs. For a complete list of changes, you can check the <a href="https://github.com/r-lib/httr2/releases" target="_blank" rel="noopener">GitHub release notes</a> or the <a href="https://httr2.r-lib.org/news/index.html" target="_blank" rel="noopener">NEWS file</a>.</p> <h2 id="installation">Installation <a href="#installation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Install httr2 from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"httr2"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="streaming-data">Streaming data <a href="#streaming-data"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The headline feature of this release is a better API for streaming responses, where the body is not available immediately but is streamed back over time. This is particularly important for interacting with LLMs, where it&rsquo;s needed to make chat responses feel snappy. You can try it out in <a href="https://ellmer.tidyverse.org" target="_blank" rel="noopener">ellmer</a>, our new package for chatting with LLMs from a variety of providers.</p> <p>The most important new function is <a href="https://httr2.r-lib.org/reference/req_perform_connection.html" target="_blank" rel="noopener"><code>req_perform_connection()</code></a>, which supersedes the older callback-based <a href="https://httr2.r-lib.org/reference/req_perform_stream.html" target="_blank" rel="noopener"><code>req_perform_stream()</code></a>. Unlike its predecessor, <a href="https://httr2.r-lib.org/reference/req_perform_connection.html" target="_blank" rel="noopener"><code>req_perform_connection()</code></a> returns a regular response object with a connection object for the body:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://httr2.r-lib.org'>httr2</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>req</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/request.html'>request</a></span><span class='o'>(</span><span class='nf'><a href='https://httr2.r-lib.org/reference/example_url.html'>example_url</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_template.html'>req_template</a></span><span class='o'>(</span><span class='s'>"/stream-bytes/:n"</span>, n <span class='o'>=</span> <span class='m'>10240</span><span class='o'>)</span></span> <span><span class='nv'>resp</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_perform_connection.html'>req_perform_connection</a></span><span class='o'>(</span><span class='nv'>req</span><span class='o'>)</span></span> <span><span class='nv'>resp</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>&lt;httr2_response&gt;</span></span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>GET</span> http://127.0.0.1:49283/stream-bytes/10240</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Status</span>: 200 OK</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Content-Type</span>: application/octet-stream</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Body</span>: Streaming connection</span></span> <span></span></code></pre> </div> <p>Once you have a streaming connection you can repeatedly call a <code>resp_stream_*()</code> function to pull down data in chunks, using <a href="https://httr2.r-lib.org/reference/resp_stream_raw.html" target="_blank" rel="noopener"><code>resp_stream_is_complete()</code></a> to figure out when to stop.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'>while</span> <span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://httr2.r-lib.org/reference/resp_stream_raw.html'>resp_stream_is_complete</a></span><span class='o'>(</span><span class='nv'>resp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>bytes</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_stream_raw.html'>resp_stream_raw</a></span><span class='o'>(</span><span class='nv'>resp</span>, kb <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/cat.html'>cat</a></span><span class='o'>(</span><span class='s'>"Downloaded "</span>, <span class='nf'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='o'>(</span><span class='nv'>bytes</span><span class='o'>)</span>, <span class='s'>" bytes\n"</span>, sep <span class='o'>=</span> <span class='s'>""</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span><span class='c'>#&gt; Downloaded 2048 bytes</span></span> <span><span class='c'>#&gt; Downloaded 2048 bytes</span></span> <span><span class='c'>#&gt; Downloaded 2048 bytes</span></span> <span><span class='c'>#&gt; Downloaded 2048 bytes</span></span> <span><span class='c'>#&gt; Downloaded 2048 bytes</span></span> <span><span class='c'>#&gt; Downloaded 0 bytes</span></span> <span></span></code></pre> </div> <p>As well as <a href="https://httr2.r-lib.org/reference/resp_stream_raw.html" target="_blank" rel="noopener"><code>resp_stream_raw()</code></a>, which returns a raw vector, you can use <a href="https://httr2.r-lib.org/reference/resp_stream_raw.html" target="_blank" rel="noopener"><code>resp_stream_lines()</code></a> to stream lines and <a href="https://httr2.r-lib.org/reference/resp_stream_raw.html" target="_blank" rel="noopener"><code>resp_stream_sse()</code></a> to stream <a href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events" target="_blank" rel="noopener">server-sent events</a>.</p> <h2 id="url-manipulation-tools">URL manipulation tools <a href="#url-manipulation-tools"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Working with URLs got easier with three new functions: <a href="https://httr2.r-lib.org/reference/url_modify.html" target="_blank" rel="noopener"><code>url_modify()</code></a>, <a href="https://httr2.r-lib.org/reference/url_modify.html" target="_blank" rel="noopener"><code>url_modify_query()</code></a>, and <a href="https://httr2.r-lib.org/reference/url_modify.html" target="_blank" rel="noopener"><code>url_modify_relative()</code></a>. You can see how they work in the examples below:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># url_modify() modifies components of a URL</span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify</a></span><span class='o'>(</span><span class='s'>"https://example.com"</span>, hostname <span class='o'>=</span> <span class='s'>"github.com"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "https://github.com/"</span></span> <span></span><span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify</a></span><span class='o'>(</span><span class='s'>"https://example.com"</span>, scheme <span class='o'>=</span> <span class='s'>"http"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "http://example.com/"</span></span> <span></span><span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify</a></span><span class='o'>(</span><span class='s'>"https://example.com"</span>, path <span class='o'>=</span> <span class='s'>"abc"</span>, query <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>foo <span class='o'>=</span> <span class='s'>"bar"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "https://example.com/abc?foo=bar"</span></span> <span></span><span></span> <span><span class='c'># url_modify_query() lets you modify individual query parameters</span></span> <span><span class='c'># modifying an existing parameter:</span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_query</a></span><span class='o'>(</span><span class='s'>"http://example.com?a=1&amp;b=2"</span>, a <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "http://example.com/?b=2&amp;a=10"</span></span> <span></span><span><span class='c'># delete a parameter:</span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_query</a></span><span class='o'>(</span><span class='s'>"http://example.com?a=1&amp;b=2"</span>, b <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "http://example.com/?a=1"</span></span> <span></span><span><span class='c'># add a new parameter:</span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_query</a></span><span class='o'>(</span><span class='s'>"http://example.com?a=1&amp;b=2"</span>, c <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "http://example.com/?a=1&amp;b=2&amp;c=3"</span></span> <span></span><span></span> <span><span class='c'># url_modify_relative() navigates to a relative URL</span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_relative</a></span><span class='o'>(</span><span class='s'>"https://example.com/a/b/c.html"</span>, <span class='s'>"/d/e/f.html"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "https://example.com/d/e/f.html"</span></span> <span></span><span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_relative</a></span><span class='o'>(</span><span class='s'>"https://example.com/a/b/c.html"</span>, <span class='s'>"C.html"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "https://example.com/a/b/C.html"</span></span> <span></span><span><span class='nf'><a href='https://httr2.r-lib.org/reference/url_modify.html'>url_modify_relative</a></span><span class='o'>(</span><span class='s'>"https://example.com/a/b/c.html"</span>, <span class='s'>"../B.html"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "https://example.com/a/B.html"</span></span> <span></span></code></pre> </div> <p>We also added <a href="https://httr2.r-lib.org/reference/req_url.html" target="_blank" rel="noopener"><code>req_url_relative()</code></a> to make it easier to navigate to a relative URL for an existing request.</p> <h2 id="other-improvements">Other improvements <a href="#other-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are a handful of other improvements that are worth highlighting:</p> <ul> <li> <p>We&rsquo;ve made it easier to talk to AWS web services with <a href="https://httr2.r-lib.org/reference/req_auth_aws_v4.html" target="_blank" rel="noopener"><code>req_auth_aws_v4()</code></a> for signing requests and <a href="https://httr2.r-lib.org/reference/resp_stream_raw.html" target="_blank" rel="noopener"><code>resp_stream_aws()</code></a> for streaming responses. Special thanks goes to the <a href="https://github.com/lifion/lifion-aws-event-stream/" target="_blank" rel="noopener">lifion-aws-event-stream</a> project for providing a clear reference implementation.</p> </li> <li> <p>We&rsquo;ve run-down a long list of bugs that made <a href="https://httr2.r-lib.org/reference/req_cache.html" target="_blank" rel="noopener"><code>req_cache()</code></a> unreliable. This includes improving the handling of header-only changes, better cache pruning, and new debugging options. If you&rsquo;re working with a web API that supports caching, we highly recommend that you try it out. The next release of { <a href="https://github.com/r-lib/gh" target="_blank" rel="noopener">gh</a>} will use a cache by default, and my use of the dev version suggests that it gives a pretty nice performance improvment.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/is_online.html" target="_blank" rel="noopener"><code>is_online()</code></a> provides an easy way to check internet connectivity.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_perform_promise.html" target="_blank" rel="noopener"><code>req_perform_promise()</code></a> allows you to execute requests in the background (thanks to <a href="https://github.com/gergness" target="_blank" rel="noopener">@gergness</a>) using an efficient approach that waits on curl socket activity (thanks to <a href="https://github.com/shikokuchuo" target="_blank" rel="noopener">@shikokuchuo</a>).</p> </li> </ul> <h2 id="breaking-changes">Breaking changes <a href="#breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As httr2 continues to mature, we&rsquo;ve made some lifecycle changes:</p> <ul> <li> <a href="https://httr2.r-lib.org/reference/req_perform_iterative.html" target="_blank" rel="noopener"><code>req_perform_iterative()</code></a> is now stable and no longer experimental.</li> <li> <a href="https://httr2.r-lib.org/reference/req_perform_stream.html" target="_blank" rel="noopener"><code>req_perform_stream()</code></a> is superseded by <a href="https://httr2.r-lib.org/reference/req_perform_connection.html" target="_blank" rel="noopener"><code>req_perform_connection()</code></a>, as mentioned above.</li> <li> <a href="https://httr2.r-lib.org/reference/with_mocked_responses.html" target="_blank" rel="noopener"><code>with_mock()</code></a> and <a href="https://httr2.r-lib.org/reference/with_mocked_responses.html" target="_blank" rel="noopener"><code>local_mock()</code></a> are defunct and will be rmeoved in the next release. Use <a href="https://httr2.r-lib.org/reference/with_mocked_responses.html" target="_blank" rel="noopener"><code>with_mocked_responses()</code></a> and <a href="https://httr2.r-lib.org/reference/with_mocked_responses.html" target="_blank" rel="noopener"><code>local_mocked_responses()</code></a> instead.</li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 76 folks who filed issues, created PRs and generally helped to make httr2 better! <a href="https://github.com/Aariq" target="_blank" rel="noopener">@Aariq</a>, <a href="https://github.com/AGeographer" target="_blank" rel="noopener">@AGeographer</a>, <a href="https://github.com/amael-ls" target="_blank" rel="noopener">@amael-ls</a>, <a href="https://github.com/anishjoni" target="_blank" rel="noopener">@anishjoni</a>, <a href="https://github.com/asadow" target="_blank" rel="noopener">@asadow</a>, <a href="https://github.com/atheriel" target="_blank" rel="noopener">@atheriel</a>, <a href="https://github.com/awpsoras" target="_blank" rel="noopener">@awpsoras</a>, <a href="https://github.com/billsanto" target="_blank" rel="noopener">@billsanto</a>, <a href="https://github.com/bonushenricus" target="_blank" rel="noopener">@bonushenricus</a>, <a href="https://github.com/botan" target="_blank" rel="noopener">@botan</a>, <a href="https://github.com/burgerga" target="_blank" rel="noopener">@burgerga</a>, <a href="https://github.com/CareCT" target="_blank" rel="noopener">@CareCT</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/cole-brokamp" target="_blank" rel="noopener">@cole-brokamp</a>, <a href="https://github.com/covid19ec" target="_blank" rel="noopener">@covid19ec</a>, <a href="https://github.com/datapumpernickel" target="_blank" rel="noopener">@datapumpernickel</a>, <a href="https://github.com/denskh" target="_blank" rel="noopener">@denskh</a>, <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/DyfanJones" target="_blank" rel="noopener">@DyfanJones</a>, <a href="https://github.com/erydit" target="_blank" rel="noopener">@erydit</a>, <a href="https://github.com/exetico" target="_blank" rel="noopener">@exetico</a>, <a href="https://github.com/fh-mthomson" target="_blank" rel="noopener">@fh-mthomson</a>, <a href="https://github.com/frzambra" target="_blank" rel="noopener">@frzambra</a>, <a href="https://github.com/gergness" target="_blank" rel="noopener">@gergness</a>, <a href="https://github.com/GreenGrassBlueOcean" target="_blank" rel="noopener">@GreenGrassBlueOcean</a>, <a href="https://github.com/guslipkin" target="_blank" rel="noopener">@guslipkin</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/i2z1" target="_blank" rel="noopener">@i2z1</a>, <a href="https://github.com/isachng93" target="_blank" rel="noopener">@isachng93</a>, <a href="https://github.com/IshuaWang" target="_blank" rel="noopener">@IshuaWang</a>, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/JBGruber" target="_blank" rel="noopener">@JBGruber</a>, <a href="https://github.com/jcheng5" target="_blank" rel="noopener">@jcheng5</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/jimbrig" target="_blank" rel="noopener">@jimbrig</a>, <a href="https://github.com/jjesusfilho" target="_blank" rel="noopener">@jjesusfilho</a>, <a href="https://github.com/jl5000" target="_blank" rel="noopener">@jl5000</a>, <a href="https://github.com/jmuhlenkamp" target="_blank" rel="noopener">@jmuhlenkamp</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/jwimberl" target="_blank" rel="noopener">@jwimberl</a>, <a href="https://github.com/krjaworski" target="_blank" rel="noopener">@krjaworski</a>, <a href="https://github.com/m-muecke" target="_blank" rel="noopener">@m-muecke</a>, <a href="https://github.com/maarten-vermeyen" target="_blank" rel="noopener">@maarten-vermeyen</a>, <a href="https://github.com/MarekGierlinski" target="_blank" rel="noopener">@MarekGierlinski</a>, <a href="https://github.com/maxsutton" target="_blank" rel="noopener">@maxsutton</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/MSHelm" target="_blank" rel="noopener">@MSHelm</a>, <a href="https://github.com/mstei4176" target="_blank" rel="noopener">@mstei4176</a>, <a href="https://github.com/mthomas-ketchbrook" target="_blank" rel="noopener">@mthomas-ketchbrook</a>, <a href="https://github.com/NateNohling" target="_blank" rel="noopener">@NateNohling</a>, <a href="https://github.com/nick-youngblut" target="_blank" rel="noopener">@nick-youngblut</a>, <a href="https://github.com/pbulsink" target="_blank" rel="noopener">@pbulsink</a>, <a href="https://github.com/PietrH" target="_blank" rel="noopener">@PietrH</a>, <a href="https://github.com/pkautio" target="_blank" rel="noopener">@pkautio</a>, <a href="https://github.com/plietar" target="_blank" rel="noopener">@plietar</a>, <a href="https://github.com/pmlefeuvre-met" target="_blank" rel="noopener">@pmlefeuvre-met</a>, <a href="https://github.com/rkrug" target="_blank" rel="noopener">@rkrug</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/shikokuchuo" target="_blank" rel="noopener">@shikokuchuo</a>, <a href="https://github.com/simplyalexander" target="_blank" rel="noopener">@simplyalexander</a>, <a href="https://github.com/sluga" target="_blank" rel="noopener">@sluga</a>, <a href="https://github.com/stefanedwards" target="_blank" rel="noopener">@stefanedwards</a>, <a href="https://github.com/steveputman" target="_blank" rel="noopener">@steveputman</a>, <a href="https://github.com/tebancr" target="_blank" rel="noopener">@tebancr</a>, <a href="https://github.com/thohan88" target="_blank" rel="noopener">@thohan88</a>, <a href="https://github.com/tony2015116" target="_blank" rel="noopener">@tony2015116</a>, <a href="https://github.com/toobiwankenobi" target="_blank" rel="noopener">@toobiwankenobi</a>, <a href="https://github.com/verhovsky" target="_blank" rel="noopener">@verhovsky</a>, <a href="https://github.com/walinchus" target="_blank" rel="noopener">@walinchus</a>, <a href="https://github.com/werkstattcodes" target="_blank" rel="noopener">@werkstattcodes</a>, and <a href="https://github.com/zacdav-db" target="_blank" rel="noopener">@zacdav-db</a>.</p> Updates to Text Rendering in R Graphics https://www.tidyverse.org/blog/2025/01/text-rendering-updates/ Fri, 17 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/text-rendering-updates/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:6cf4jj62ofgoswlxcqdvtcg5/app.bsky.feed.post/3lfevs4qe5k2l" data-bluesky-cid="bafyreigvpe72rfvuw47nr6qrec2je2lntsowdnzer5vz2imjbejree6ebm"><p lang="en">text rendering is one of those disciplines where, if you think you finally got it right, you can be 100% certain that you didn&#x27;t</p>&mdash; Thomas Lin Pedersen (<a href="https://bsky.app/profile/did:plc:6cf4jj62ofgoswlxcqdvtcg5?ref_src=embed">@thomasp85.com</a>) <a href="https://bsky.app/profile/did:plc:6cf4jj62ofgoswlxcqdvtcg5/post/3lfevs4qe5k2l?ref_src=embed">January 10, 2025 at 10:44 AM</a></blockquote><script async src="https://embed.bsky.app/static/embed.js" charset="utf-8"></script> <p>No reason to hide the fact: Text rendering is complicated! When I set out to improve the support for modern text rendering features in R all those years ago, I don&rsquo;t think I truly appreciated that fact. And probably for the better, since I&rsquo;m not sure I would have taken on the task had I known.</p> <p>Taking the quote above as a universal truth (it comes from a reputable source after all), I&rsquo;m sure I&rsquo;ll never be fully done, but recent work on the whole stack at least makes me worry less about the correctness. This post will go through the changes that span the <a href="https://github.com/r-lib/systemfonts" target="_blank" rel="noopener">systemfonts</a>, <a href="https://github.com/r-lib/textshaping" target="_blank" rel="noopener">textshaping</a>, and <a href="https://marquee.r-lib.org" target="_blank" rel="noopener">marquee</a> packages and let you now how you, as a user or developer, should take advantage of them.</p> <h2 id="working-with-non-installed-fonts">Working with non-installed fonts <a href="#working-with-non-installed-fonts"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The genesis of the systemfonts package was a need to be able to tap into the operating systems font library, so that whatever was installed on the system, would be available to graphics devices (assuming those devices used systemfonts). The scope of the package has gradually increased, and one of the needs that has become obvious over time, is a way to work with fonts, that aren&rsquo;t installed on the system (E.g. if you want to bundle a font with a package, or if you are deploying a Shiny app that uses a specific font for the graphics).</p> <p>Until now, the <code>register_font()</code> and <code>register_variant()</code> functions have been the only option for letting systemfonts know about fonts other than those installed on the system. However, both of these functions were designed to circumvent limitations in the R graphics system when it comes to font selection (e.g. no way to use a &ldquo;thin&rdquo; font variant as the only weight option in the graphics system is bold yes/no), and as such were clunky to use for introducing new fonts.</p> <p>With the new version of systemfonts we get a dedicated way to tell systemfonts &ldquo;please consider these font files as equals to the installed ones&rdquo;. The function is called <code>add_fonts()</code> and all you need to do is to pass in a vector of paths to font files and these will then be reachable by systemfonts.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Add fonts from specific files</span> <span class="n">systemfonts</span><span class="o">::</span><span class="nf">add_fonts</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;path/to/font1.ttf&#34;</span><span class="p">,</span> <span class="s">&#34;path/to/font2.ttf&#34;</span><span class="p">))</span> </code></pre></div><p>In addition to this function, systemfonts also comes with <code>scan_local_fonts()</code> that looks in <code>./fonts</code> and <code>~/fonts</code> and adds any fonts located there. The function is called when systemfonts is loaded meaning that you can immediately uses fonts saved in these directories. This is great for deploying projects because all you need to do is to include a <code>fonts</code> folder at the root of you project and these fonts will then always be available wherever you deploy your code.</p> <p>While it is nice to have good access to the font files on your computer, the files has to come from somewhere. Nowadays that <em>somewhere</em> is usually <a href="https://fonts.google.com" target="_blank" rel="noopener">Google Fonts</a> or some other online font repository. systemfonts is now aware of a few of these repositories (Google Fonts and <a href="https://www.fontsquirrel.com" target="_blank" rel="noopener">Font Squirrel</a> for now), and can search and download from these (using <code>search_web_fonts()</code>, <code>get_from_google_fonts()</code>, and <code>get_from_font_squirrel()</code>). The downloaded fonts are automatically added using <code>add_fonts()</code> so they are immediately available, and by default they are placed in <code>~/fonts</code> so that they persist across R sessions and projects.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Search and download fonts</span> <span class="n">systemfonts</span><span class="o">::</span><span class="nf">get_from_font_squirrel</span><span class="p">(</span><span class="s">&#34;Quicksand&#34;</span><span class="p">)</span> <span class="n">systemfonts</span><span class="o">::</span><span class="nf">get_from_google_fonts</span><span class="p">(</span><span class="s">&#34;Rubik Moonrocks&#34;</span><span class="p">)</span> </code></pre></div><p>But what if you don&rsquo;t want to think too much about all these details and just want to ensure that some specific font is available when a piece of code is running? In that case <code>require_font()</code> got you covered. This function allows you to state a dependency on a font in a script. The function scans the available fonts on the system and, if it doesn&rsquo;t find anything, proceeds to look for the font in the online repositories, downloading it if it finds it. If that also fails the function will either throw an error, or map the required font to a fallback of your choosing:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">systemfonts</span><span class="p">)</span> <span class="nf">require_font</span><span class="p">(</span><span class="s">&#34;Rubik Moonrocks&#34;</span><span class="p">)</span> <span class="nf">plot.new</span><span class="p">()</span> <span class="nf">text</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span> <span class="m">0.5</span><span class="p">,</span> <span class="s">&#34;Fancy Font&#34;</span><span class="p">,</span> <span class="n">family</span> <span class="o">=</span> <span class="s">&#34;Rubik Moonrocks&#34;</span><span class="p">,</span> <span class="n">cex</span> <span class="o">=</span> <span class="m">6</span><span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-3-1.png" alt="plot of chunk unnamed-chunk-3"></p> <p>Remember that all of these niceties only goes into effect if you use a graphics device that uses systemfonts. For now, that more or less means that you should use ragg (you should use ragg anyway so that is not a terrible requirement).</p> <h2 id="getting-to-the-glyphs">Getting to the Glyphs <a href="#getting-to-the-glyphs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Most fonts these days are based on a vector outline. That means that they can be scaled smoothly to any size and doesn&rsquo;t take up a lot of storage space. It also means that there are polygons inside the font files and that these can be extracted! This is now possible with systemfonts and the new <code>glyph_outline()</code> and <code>glyph_raster()</code> functions.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Get the location of the glyph inside the font</span> <span class="n">moonrocks</span> <span class="o">&lt;-</span> <span class="nf">font_info</span><span class="p">(</span><span class="s">&#34;Rubik Moonrocks&#34;</span><span class="p">)</span> <span class="n">G</span> <span class="o">&lt;-</span> <span class="nf">glyph_info</span><span class="p">(</span><span class="s">&#34;G&#34;</span><span class="p">,</span> <span class="n">path</span> <span class="o">=</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">path</span><span class="p">,</span> <span class="n">index</span> <span class="o">=</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">index</span><span class="p">)</span> <span class="c1"># Extract the outline of the glyph and plot it</span> <span class="n">outline</span> <span class="o">&lt;-</span> <span class="nf">glyph_outline</span><span class="p">(</span><span class="n">G</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">path</span><span class="p">,</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="m">400</span><span class="p">)</span> <span class="n">grid</span><span class="o">::</span><span class="nf">grid.path</span><span class="p">(</span> <span class="n">x</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">y</span> <span class="o">+</span> <span class="m">20</span><span class="p">,</span> <span class="c1"># To raise the baseline a bit</span> <span class="n">id</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">contour</span><span class="p">,</span> <span class="n">default.units</span> <span class="o">=</span> <span class="s">&#34;bigpts&#34;</span><span class="p">,</span> <span class="n">gp</span> <span class="o">=</span> <span class="n">grid</span><span class="o">::</span><span class="nf">gpar</span><span class="p">(</span><span class="n">fill</span> <span class="o">=</span> <span class="s">&#34;grey&#34;</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="s">&#34;black&#34;</span><span class="p">,</span> <span class="n">lwd</span> <span class="o">=</span> <span class="m">4</span><span class="p">)</span> <span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-4-1.png" alt="plot of chunk unnamed-chunk-4"></p> <p>Extracting them as polygons means that we can do all sorts of weird stuff with them if we so pleases:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># Skew the glyph making it italic</span> <span class="n">grid</span><span class="o">::</span><span class="nf">grid.path</span><span class="p">(</span> <span class="n">x</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">x</span> <span class="o">+</span> <span class="n">outline</span><span class="o">$</span><span class="n">y</span> <span class="o">*</span> <span class="m">0.4</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">y</span> <span class="o">+</span> <span class="m">20</span><span class="p">,</span> <span class="c1"># To raise the baseline a bit</span> <span class="n">id</span> <span class="o">=</span> <span class="n">outline</span><span class="o">$</span><span class="n">contour</span><span class="p">,</span> <span class="n">default.units</span> <span class="o">=</span> <span class="s">&#34;bigpts&#34;</span><span class="p">,</span> <span class="n">gp</span> <span class="o">=</span> <span class="n">grid</span><span class="o">::</span><span class="nf">gpar</span><span class="p">(</span><span class="n">fill</span> <span class="o">=</span> <span class="s">&#34;grey&#34;</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="s">&#34;black&#34;</span><span class="p">,</span> <span class="n">lwd</span> <span class="o">=</span> <span class="m">4</span><span class="p">)</span> <span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-5-1.png" alt="plot of chunk unnamed-chunk-5"></p> <p>(real italic glyphs are designed to look good skewed, not just skewed versions of the regular glyphs)</p> <p>Remember how I said &ldquo;most fonts&rdquo; in the beginning of this section. There are still fonts that do not provide an outline, the prime example being most emoji fonts. The glyphs in such fonts are encoded as multiple bitmaps at fixed sizes (Microsofts emoji font going a different way by encoding them as SVGs). Since we can&rsquo;t get to the data as outlines we can instead extract it as a raster:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">emoji</span> <span class="o">&lt;-</span> <span class="nf">font_info</span><span class="p">(</span><span class="s">&#34;emoji&#34;</span><span class="p">)</span> <span class="n">dancer</span> <span class="o">&lt;-</span> <span class="nf">glyph_info</span><span class="p">(</span><span class="s">&#34;💃&#34;</span><span class="p">,</span> <span class="n">path</span> <span class="o">=</span> <span class="n">emoji</span><span class="o">$</span><span class="n">path</span><span class="p">,</span> <span class="n">index</span> <span class="o">=</span> <span class="n">emoji</span><span class="o">$</span><span class="n">index</span><span class="p">)</span> <span class="n">raster</span> <span class="o">&lt;-</span> <span class="nf">glyph_raster</span><span class="p">(</span><span class="n">dancer</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">emoji</span><span class="o">$</span><span class="n">path</span><span class="p">,</span> <span class="n">emoji</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="m">400</span><span class="p">)</span> <span class="n">grid</span><span class="o">::</span><span class="nf">grid.draw</span><span class="p">(</span><span class="nf">glyph_raster_grob</span><span class="p">(</span><span class="n">raster[[1]]</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">50</span><span class="p">))</span> </code></pre></div><p><img src="figure/unnamed-chunk-6-1.png" alt="plot of chunk unnamed-chunk-6"></p> <p>In the above we used the <code>glyph_raster_grob()</code> helper function to create a raster grob with the correct scaling of the resulting raster.</p> <p>Raster extraction is not only for bitmap encoded fonts since it is easy to go from an outline to a raster (but not the other way around). Freetype (which systemfonts uses) includes a very efficient scanline rasterizer (the same as used in ragg) and we can thus get a raster version of any font:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">raster2</span> <span class="o">&lt;-</span> <span class="nf">glyph_raster</span><span class="p">(</span><span class="n">G</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">path</span><span class="p">,</span> <span class="n">moonrocks</span><span class="o">$</span><span class="n">index</span><span class="p">,</span> <span class="n">size</span> <span class="o">=</span> <span class="m">400</span><span class="p">)</span> <span class="n">grid</span><span class="o">::</span><span class="nf">grid.draw</span><span class="p">(</span><span class="nf">glyph_raster_grob</span><span class="p">(</span><span class="n">raster2[[1]]</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">20</span><span class="p">))</span> </code></pre></div><p><img src="figure/unnamed-chunk-7-1.png" alt="plot of chunk unnamed-chunk-7"></p> <h2 id="the-way-the-text-flows">The Way the Text Flows <a href="#the-way-the-text-flows"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The thing that provoked me to writing the quote in the beginning of this blog post, was my work on the textshaping package. This package is largely invisible to the user but together with systemfonts it is responsible for laying out strings of text correctly. It figures out the location of every glyph and finds alternative fonts if the selected one doesn&rsquo;t contain the needed glyph. textshaping powers ragg as well as marquee, doing the heavy lifting of translating a string of text into glyphs and locations.</p> <p>Part of converting a string into glyphs and coordinates (a process known as text shaping) is to figure out which way the text flows and act accordingly. For many people left-to-right flow is the natural text direction, but this is merely a cultural bias and many scripts with a different flow exists (arabic and hebrew being the two most dominant right-to-left flowing scripts). So, part of shaping requires figuring out what script a specific character belongs to and what direction it flows. This is all fairly simple when a string internally agrees on the direction of flow, but can get much more complicated when scripts are embedded within other scripts that doesn&rsquo;t have the same flow (not to mention scripts embedded even deeper). Combine all of this with soft wrapping of text inside an embedded script and you got the recipe for a headache. textshaping (through me) already made the claim that it fully supported bi-directional text but it turned out that I severely misjudged the complexity. Because of this, the shaping engine has been rewritten almost from scratch. Based on the starting quote I can&rsquo;t quite claim that it now works 100% correctly but it does pass all 91.707 test cases for bidirectional text provided by the Unicode consortium so there&rsquo;s that.</p> <p>Again, it is unlikely that you will come into contact with textshaping directly so you will mostly experience these improvements in the way text just appears more correct (to the extend that this was ever an issue for you). The place you are most likely to stumble upon these changes is marquee, which uses textshaping under the hood. Styling in marquee has been expanded to include a <code>text_direction</code> setting. It defaults to <code>&quot;auto&quot;</code> which mean &ldquo;deduce it from the text you get&rdquo;, but you can also set it to <code>&quot;ltr&quot;</code> or <code>&quot;rtl&quot;</code> to set the direction explicitly. Be aware that this setting doesn&rsquo;t change how single glyphs flow so you cannot use it to e.g. write arabic in left-to-right flow. Instead it governs the paragraph-level direction and thus how bi-directional text should be assembled. It also governs to which side indentation happen and the placement of bullets in bullet lists. Often, leaving it on the default value will work fine. There are also two new values for the <code>align</code> setting. <code>&quot;auto&quot;</code> picks either <code>&quot;left&quot;</code> or <code>&quot;right&quot;</code> depending on the text direction, while <code>&quot;justified&quot;</code> picks either <code>&quot;justified-left&quot;</code> or <code>&quot;justified-right&quot;</code>. This makes it much easier to work natively with right-to-left text as everything just looks as it should. To top it off, <code>classic_style()</code> gains an <code>ltr</code> argument that controls whether the styling in general should cater to left-to-right or right-to-left text. It controls things such as the position of the grey bar in quotation blocks and the indentation of nested lists.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">marquee</span><span class="p">)</span> <span class="c1"># Create a style specific for rtl text</span> <span class="n">rtl_style</span> <span class="o">&lt;-</span> <span class="nf">classic_style</span><span class="p">(</span> <span class="n">text_direction</span> <span class="o">=</span> <span class="s">&#34;rtl&#34;</span><span class="p">,</span> <span class="c1"># Forces bidi text to be assembled from right to left</span> <span class="n">align</span> <span class="o">=</span> <span class="s">&#34;auto&#34;</span><span class="p">,</span> <span class="c1"># Will convert itself to &#34;right&#34;</span> <span class="n">ltr</span> <span class="o">=</span> <span class="kc">FALSE</span> <span class="c1"># Will move bullet padding and bar along quote blocks to the right</span> <span class="p">)</span> </code></pre></div> <h2 id="a-marquee-for-everyone">A marquee for Everyone <a href="#a-marquee-for-everyone"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Speaking of marquee, the biggest obstacle it has put in front of its users is that it is build on very new features in R. The ability to write text by placing glyphs one at a time was only added in R 4.2 and not every graphics device supports it yet (worse still, the implementation in the default macOS quartz device caused the session to crash). Again, ragg is your friend, but the Cairo devices also has excellent support.</p> <p>Text rendering, however, should always work. It is quite frustrating for text to not show up when you expect it to. Because of this it has been a clear plan to expand the support for marquee somehow. With the new version of marquee this is finally a reality. How does it work? Well, remember when we talked about extracting glyph outlines and rasters? If marquee encounters a graphics device that doesn&rsquo;t provide the necessary features it will take matters into its own hands, by extracting all the necessary polygons and bitmaps and plot them. It is certainly not faster than relying on the optimized routines of the graphics device and it can also lead to visual degradation at smaller font sizes. But it works - everywhere.</p> <p>To show it off, here is an svg created with svglite which doesn&rsquo;t have the required new features:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">text</span> <span class="o">&lt;-</span> <span class="s">&#34;_Fancy_ {.red Font}📝&#34;</span> <span class="n">m_grob</span> <span class="o">&lt;-</span> <span class="nf">marquee_grob</span><span class="p">(</span> <span class="n">text</span><span class="p">,</span> <span class="nf">classic_style</span><span class="p">(</span> <span class="n">body_font</span> <span class="o">=</span> <span class="s">&#34;Rubik Moonrocks&#34;</span><span class="p">,</span> <span class="n">base_size</span> <span class="o">=</span> <span class="m">72</span> <span class="p">)</span> <span class="p">)</span> <span class="n">s</span> <span class="o">&lt;-</span> <span class="n">svglite</span><span class="o">::</span><span class="nf">svgstring</span><span class="p">(</span><span class="n">width</span> <span class="o">=</span> <span class="m">7</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="m">1.5</span><span class="p">)</span> <span class="n">grid</span><span class="o">::</span><span class="nf">grid.draw</span><span class="p">(</span><span class="n">m_grob</span><span class="p">)</span> <span class="nf">invisible</span><span class="p">(</span><span class="nf">dev.off</span><span class="p">())</span> <span class="nf">s</span><span class="p">()</span> </code></pre></div><?xml version='1.0' encoding='UTF-8' ?> <svg xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' width='504.00pt' height='108.00pt' viewBox='0 0 504.00 108.00'> <g class='svglite'> <defs> <style type='text/css'><![CDATA[ .svglite line, .svglite polyline, .svglite polygon, .svglite path, .svglite rect, .svglite circle { fill: none; stroke: #000000; stroke-linecap: round; stroke-linejoin: round; stroke-miterlimit: 10.00; } .svglite text { white-space: pre; } ]]></style> </defs> <rect width='100%' height='100%' style='stroke: none; fill: #FFFFFF;'/> <defs> <clipPath id='cpMC4wMHw1MDQuMDB8MC4wMHwxMDguMDA='> <rect x='0.00' y='0.00' width='504.00' height='108.00' /> </clipPath> </defs> <g clip-path='url(#cpMC4wMHw1MDQuMDB8MC4wMHwxMDguMDA=)'> <path d='M 5.16 71.17 L 4.53 70.74 L 4.10 70.11 L 3.95 69.38 L 3.95 22.86 L 4.10 22.12 L 4.53 21.49 L 5.16 21.05 L 5.91 20.91 L 42.19 20.91 L 42.93 21.05 L 43.56 21.49 L 44.00 22.12 L 44.14 22.86 L 44.14 33.80 L 44.00 34.54 L 43.56 35.17 L 42.93 35.61 L 42.19 35.75 L 21.59 35.75 L 21.59 41.08 L 40.75 41.08 L 41.49 41.22 L 42.12 41.66 L 42.38 41.95 L 42.56 42.28 L 42.67 42.63 L 42.70 43.02 L 42.70 53.89 L 42.56 54.63 L 42.12 55.27 L 41.49 55.69 L 40.75 55.83 L 21.59 55.83 L 21.59 69.38 L 21.45 70.11 L 21.02 70.74 L 20.72 70.99 L 20.39 71.17 L 20.04 71.28 L 19.66 71.31 L 5.91 71.31 ZM 24.59 24.25 L 24.07 24.25 L 23.62 24.29 L 23.25 24.38 L 22.93 24.54 L 22.62 24.82 L 22.32 25.22 L 22.03 25.74 L 21.80 26.28 L 21.67 26.77 L 21.65 27.19 L 21.73 27.55 L 21.92 27.88 L 22.19 28.24 L 22.54 28.63 L 22.97 29.05 L 23.45 29.48 L 23.88 29.84 L 24.27 30.13 L 24.61 30.34 L 24.97 30.46 L 25.41 30.45 L 25.92 30.32 L 26.50 30.06 L 27.08 29.70 L 27.51 29.34 L 27.79 28.99 L 27.94 28.63 L 27.98 28.22 L 27.97 27.72 L 27.91 27.13 L 27.80 26.45 L 27.67 25.90 L 27.49 25.44 L 27.27 25.09 L 27.00 24.84 L 26.67 24.66 L 26.26 24.51 L 25.77 24.39 L 25.20 24.30 ZM 8.28 25.87 L 8.28 25.53 L 8.16 25.27 L 7.78 25.09 L 7.34 25.01 L 6.95 25.07 L 6.62 25.26 L 6.34 25.59 L 6.04 26.05 L 5.89 26.42 L 5.93 26.79 L 6.19 27.25 L 6.57 27.60 L 6.98 27.81 L 7.43 27.85 L 7.92 27.75 L 8.28 27.50 L 8.34 27.17 L 8.30 26.78 L 8.28 26.31 ZM 39.02 26.01 L 38.88 26.17 L 38.66 26.67 L 38.38 27.25 L 38.34 27.49 L 38.52 27.75 L 38.79 27.95 L 39.02 27.94 L 39.29 27.82 L 39.67 27.69 L 40.09 27.24 L 40.17 26.75 L 40.07 26.43 L 39.89 26.19 L 39.64 26.03 L 39.31 25.95 ZM 33.61 26.60 L 33.49 26.38 L 33.34 26.24 L 33.16 26.17 L 32.68 26.10 L 32.05 26.03 L 31.54 26.01 L 31.17 26.09 L 30.89 26.34 L 30.59 26.81 L 30.48 27.27 L 30.53 27.61 L 30.75 27.93 L 31.11 28.33 L 31.62 28.72 L 32.00 29.02 L 32.18 29.10 L 32.41 29.11 L 32.67 29.05 L 32.97 28.91 L 33.50 28.48 L 33.77 28.08 L 33.82 27.59 L 33.70 26.89 ZM 12.07 26.99 L 11.77 26.81 L 11.49 26.80 L 11.16 27.03 L 10.80 27.38 L 10.58 27.69 L 10.55 28.04 L 10.73 28.55 L 10.97 28.92 L 11.27 29.05 L 11.64 29.03 L 12.09 28.97 L 12.52 28.90 L 12.81 28.80 L 13.01 28.59 L 13.17 28.19 L 13.18 27.83 L 13.03 27.61 L 12.78 27.45 L 12.45 27.25 ZM 38.45 29.99 L 37.94 30.20 L 37.66 30.92 L 38.52 31.06 L 38.81 30.42 ZM 18.53 30.09 L 18.08 30.20 L 17.72 30.51 L 17.36 31.00 L 17.01 31.64 L 16.84 32.11 L 16.92 32.55 L 17.28 33.08 L 17.71 33.51 L 18.14 33.63 L 18.65 33.54 L 19.30 33.38 L 19.86 33.09 L 20.27 32.83 L 20.40 32.68 L 20.49 32.46 L 20.53 32.19 L 20.52 31.86 L 20.40 31.18 L 20.19 30.75 L 19.80 30.45 L 19.16 30.20 ZM 6.77 34.30 L 6.48 34.88 L 7.48 34.59 L 7.20 34.09 ZM 13.53 34.30 L 13.03 34.88 L 13.39 35.45 L 13.97 35.45 L 14.41 34.67 ZM 12.43 40.32 L 12.38 39.86 L 12.29 39.48 L 12.16 39.17 L 11.97 38.92 L 11.68 38.69 L 11.29 38.50 L 10.80 38.34 L 10.25 38.19 L 9.79 38.12 L 9.39 38.12 L 9.06 38.19 L 8.78 38.35 L 8.48 38.61 L 8.17 38.97 L 7.84 39.42 L 7.53 39.87 L 7.31 40.28 L 7.19 40.64 L 7.16 40.97 L 7.22 41.30 L 7.35 41.66 L 7.56 42.07 L 7.84 42.52 L 8.19 42.89 L 8.53 43.16 L 8.84 43.32 L 9.14 43.38 L 9.83 43.29 L 10.73 43.02 L 11.57 42.71 L 12.09 42.34 L 12.25 42.10 L 12.36 41.77 L 12.43 41.36 L 12.45 40.86 ZM 16.98 42.52 L 16.70 43.24 L 17.50 43.45 L 18.14 42.88 L 17.56 42.30 ZM 24.37 43.95 L 24.01 43.85 L 23.64 43.88 L 23.25 44.03 L 22.68 44.35 L 22.25 44.58 L 22.10 44.70 L 22.00 44.89 L 21.95 45.14 L 21.95 45.47 L 22.05 45.79 L 22.18 46.02 L 22.34 46.16 L 22.53 46.22 L 23.04 46.26 L 23.69 46.33 L 24.26 46.38 L 24.66 46.38 L 24.97 46.20 L 25.27 45.75 L 25.44 45.25 L 25.38 44.89 L 25.12 44.56 L 24.70 44.17 ZM 23.99 49.12 L 24.05 48.64 L 24.04 48.44 L 23.94 48.26 L 23.75 48.11 L 23.47 47.99 L 22.82 47.85 L 22.31 47.88 L 21.89 48.12 L 21.45 48.63 L 21.29 48.90 L 21.21 49.14 L 21.22 49.35 L 21.31 49.53 L 21.63 49.92 L 22.03 50.44 L 22.36 50.78 L 22.64 50.97 L 22.96 51.02 L 23.41 50.94 L 23.79 50.76 L 23.94 50.52 L 23.96 50.17 L 23.97 49.72 ZM 16.25 48.91 L 15.77 48.86 L 15.29 49.05 L 14.69 49.50 L 14.07 50.03 L 13.64 50.47 L 13.52 50.70 L 13.47 50.98 L 13.50 51.33 L 13.61 51.74 L 13.80 52.15 L 14.01 52.47 L 14.25 52.68 L 14.50 52.78 L 15.14 52.84 L 15.98 52.81 L 16.65 52.68 L 17.06 52.42 L 17.34 51.98 L 17.56 51.30 L 17.70 50.59 L 17.67 50.08 L 17.41 49.64 L 16.84 49.20 ZM 8.42 50.50 L 7.78 51.02 L 7.92 51.80 L 8.72 51.80 L 8.78 51.16 ZM 28.86 50.53 L 28.61 50.66 L 28.41 50.90 L 28.16 51.22 L 27.99 51.48 L 27.92 51.75 L 27.95 52.02 L 28.08 52.30 L 28.34 52.70 L 28.52 53.03 L 28.75 53.21 L 29.16 53.17 L 29.61 53.00 L 29.80 52.75 L 29.86 52.38 L 29.88 51.88 L 29.86 51.38 L 29.80 51.05 L 29.62 50.81 L 29.23 50.58 ZM 35.30 51.49 L 35.06 51.30 L 34.79 51.23 L 34.42 51.30 L 33.91 51.50 L 33.52 51.66 L 33.37 51.76 L 33.27 51.93 L 33.21 52.16 L 33.19 52.45 L 33.31 53.06 L 33.52 53.42 L 33.88 53.65 L 34.48 53.81 L 35.07 53.96 L 35.53 53.97 L 35.91 53.77 L 36.28 53.31 L 36.42 53.08 L 36.47 52.88 L 36.43 52.70 L 36.31 52.56 L 35.97 52.24 L 35.56 51.80 ZM 12.41 55.72 L 12.13 55.47 L 11.75 55.36 L 11.23 55.33 L 10.84 55.38 L 10.50 55.54 L 10.23 55.81 L 10.02 56.19 L 9.84 56.70 L 9.75 57.09 L 9.85 57.45 L 10.22 57.84 L 10.68 58.11 L 11.05 58.17 L 11.44 58.06 L 11.95 57.78 L 12.37 57.44 L 12.62 57.14 L 12.71 56.76 L 12.59 56.19 ZM 14.59 62.12 L 14.39 61.24 L 14.22 60.90 L 13.91 60.62 L 13.47 60.39 L 12.89 60.22 L 12.14 60.08 L 11.49 60.01 L 10.93 60.01 L 10.47 60.08 L 10.06 60.26 L 9.65 60.59 L 9.25 61.06 L 8.86 61.67 L 8.52 62.38 L 8.29 63.02 L 8.17 63.57 L 8.17 64.05 L 8.29 64.50 L 8.52 65.00 L 8.88 65.54 L 9.36 66.13 L 9.85 66.59 L 10.32 66.89 L 10.75 67.04 L 11.16 67.03 L 11.59 66.91 L 12.09 66.75 L 12.67 66.53 L 13.31 66.27 L 13.83 66.01 L 14.23 65.74 L 14.52 65.46 L 14.69 65.16 L 14.77 64.81 L 14.82 64.40 L 14.81 63.90 L 14.77 63.33 Z' style='fill-rule: nonzero; fill: #000000; stroke-width: 0.75; stroke: none;' /> <path d='M 58.70 71.94 L 56.83 71.66 L 55.07 71.19 L 53.42 70.53 L 51.93 69.71 L 50.60 68.75 L 49.46 67.65 L 48.49 66.42 L 47.71 65.09 L 47.16 63.67 L 46.83 62.16 L 46.72 60.58 L 46.79 59.30 L 46.99 58.10 L 47.32 56.96 L 47.79 55.90 L 48.39 54.90 L 49.13 53.98 L 50.00 53.13 L 51.00 52.34 L 52.12 51.63 L 53.33 50.96 L 54.63 50.35 L 56.02 49.80 L 57.50 49.30 L 59.07 48.85 L 60.74 48.46 L 62.49 48.13 L 70.13 46.83 L 70.11 45.96 L 70.05 45.21 L 69.94 44.59 L 69.80 44.09 L 69.56 43.72 L 69.18 43.45 L 68.65 43.29 L 67.97 43.24 L 67.49 43.26 L 67.06 43.34 L 66.68 43.46 L 66.35 43.64 L 65.71 44.11 L 65.02 44.75 L 64.59 45.09 L 64.12 45.34 L 63.59 45.48 L 63.00 45.53 L 50.75 45.53 L 50.10 45.43 L 49.56 45.11 L 49.36 44.87 L 49.23 44.60 L 49.17 44.30 L 49.17 43.95 L 49.28 43.24 L 49.50 42.47 L 49.85 41.64 L 50.33 40.75 L 50.94 39.84 L 51.71 38.94 L 52.62 38.06 L 53.67 37.19 L 54.89 36.36 L 56.26 35.60 L 57.80 34.92 L 59.50 34.31 L 61.38 33.81 L 63.44 33.45 L 65.68 33.23 L 68.11 33.16 L 70.48 33.22 L 72.69 33.41 L 74.75 33.73 L 76.65 34.17 L 78.39 34.74 L 79.97 35.44 L 81.39 36.27 L 82.66 37.22 L 83.77 38.29 L 84.74 39.46 L 85.55 40.74 L 86.22 42.12 L 86.74 43.61 L 87.11 45.20 L 87.33 46.90 L 87.41 48.70 L 87.41 69.38 L 87.26 70.11 L 86.83 70.74 L 86.20 71.17 L 85.46 71.31 L 72.86 71.31 L 72.12 71.17 L 71.49 70.74 L 71.06 70.11 L 70.92 69.38 L 70.92 67.36 L 70.06 68.38 L 69.07 69.29 L 67.95 70.09 L 66.71 70.78 L 65.35 71.33 L 63.89 71.72 L 62.34 71.95 L 60.69 72.03 ZM 65.88 35.03 L 65.52 35.45 L 65.80 36.11 L 66.53 35.81 L 66.60 35.03 ZM 62.99 36.20 L 62.52 36.06 L 62.14 36.11 L 61.77 36.47 L 61.59 36.77 L 61.48 37.04 L 61.43 37.27 L 61.46 37.47 L 61.64 37.88 L 61.99 38.41 L 62.34 38.77 L 62.67 38.88 L 63.06 38.80 L 63.58 38.63 L 63.92 38.48 L 64.17 38.25 L 64.31 37.94 L 64.36 37.55 L 64.31 37.12 L 64.17 36.79 L 63.92 36.55 L 63.58 36.39 ZM 75.50 36.69 L 75.26 36.51 L 74.96 36.40 L 74.60 36.39 L 74.24 36.47 L 73.96 36.63 L 73.77 36.87 L 73.66 37.19 L 73.60 37.64 L 73.58 37.99 L 73.69 38.27 L 74.02 38.55 L 74.46 38.70 L 74.77 38.70 L 75.06 38.56 L 75.46 38.27 L 75.69 38.00 L 75.80 37.69 L 75.80 37.35 L 75.67 36.97 ZM 70.95 38.76 L 70.63 38.92 L 70.38 39.19 L 70.21 39.56 L 70.12 39.94 L 70.19 40.19 L 70.40 40.39 L 70.71 40.64 L 70.96 40.81 L 71.23 40.88 L 71.50 40.85 L 71.78 40.72 L 71.97 40.55 L 72.07 40.32 L 72.11 40.04 L 72.06 39.70 L 71.94 39.31 L 71.85 38.99 L 71.69 38.77 L 71.35 38.70 ZM 55.63 38.81 L 55.33 38.84 L 55.03 39.04 L 54.64 39.34 L 54.33 39.67 L 54.10 39.92 L 54.00 40.21 L 54.06 40.64 L 54.29 41.07 L 54.53 41.36 L 54.87 41.51 L 55.36 41.50 L 55.85 41.40 L 56.16 41.22 L 56.34 40.91 L 56.44 40.42 L 56.55 39.92 L 56.55 39.56 L 56.40 39.27 L 56.02 38.99 ZM 80.21 41.72 L 79.63 42.16 L 80.28 42.38 L 80.92 42.30 L 80.86 41.50 ZM 77.44 42.56 L 77.14 42.34 L 76.83 42.27 L 76.39 42.44 L 76.09 42.74 L 76.06 43.06 L 76.18 43.42 L 76.32 43.88 L 76.44 44.35 L 76.53 44.75 L 76.72 45.02 L 77.11 45.11 L 77.66 45.08 L 78.03 44.97 L 78.33 44.71 L 78.63 44.24 L 78.67 43.76 L 78.52 43.45 L 78.22 43.20 L 77.83 42.88 ZM 57.10 42.38 L 56.38 42.94 L 56.60 43.88 L 57.38 43.67 L 57.89 43.02 ZM 72.86 43.59 L 72.22 44.39 L 72.28 45.11 L 73.08 45.33 L 73.52 44.45 ZM 83.84 46.54 L 83.52 46.33 L 83.18 46.29 L 82.72 46.47 L 82.26 46.77 L 82.03 47.10 L 81.98 47.47 L 82.00 47.99 L 82.17 48.45 L 82.39 48.67 L 82.74 48.77 L 83.24 48.84 L 83.67 48.89 L 83.99 48.86 L 84.23 48.67 L 84.46 48.27 L 84.61 47.89 L 84.62 47.54 L 84.50 47.21 L 84.24 46.91 ZM 76.03 48.84 L 75.38 49.50 L 75.89 50.28 L 76.67 50.00 L 76.53 49.36 ZM 62.66 49.98 L 62.35 49.86 L 62.01 49.84 L 61.63 49.92 L 61.21 50.09 L 60.94 50.27 L 60.81 50.52 L 60.77 50.94 L 60.82 51.35 L 60.99 51.59 L 61.28 51.74 L 61.71 51.88 L 62.16 52.06 L 62.53 52.17 L 62.85 52.09 L 63.14 51.74 L 63.39 51.34 L 63.39 51.02 L 63.22 50.67 L 62.92 50.22 ZM 73.22 50.36 L 72.86 50.80 L 73.30 51.38 L 73.88 50.94 L 73.88 50.28 ZM 57.10 51.10 L 56.77 51.20 L 56.51 51.46 L 56.24 51.88 L 55.94 52.27 L 55.77 52.63 L 55.78 52.98 L 56.02 53.39 L 56.36 53.70 L 56.66 53.78 L 57.03 53.70 L 57.53 53.53 L 57.88 53.36 L 58.12 53.13 L 58.27 52.85 L 58.31 52.52 L 58.30 52.01 L 58.24 51.63 L 58.05 51.35 L 57.60 51.16 ZM 83.27 53.43 L 82.98 52.91 L 82.64 52.55 L 82.25 52.34 L 81.79 52.23 L 81.24 52.14 L 80.59 52.07 L 79.85 52.02 L 79.11 52.00 L 78.46 52.02 L 77.92 52.09 L 77.47 52.20 L 77.09 52.40 L 76.75 52.75 L 76.44 53.25 L 76.17 53.89 L 75.89 54.71 L 75.67 55.44 L 75.53 56.09 L 75.46 56.66 L 75.52 57.18 L 75.78 57.71 L 76.23 58.25 L 76.89 58.78 L 77.54 59.21 L 78.12 59.43 L 78.62 59.43 L 79.05 59.22 L 79.49 58.87 L 80.01 58.48 L 80.61 58.04 L 81.28 57.56 L 81.89 57.15 L 82.43 56.77 L 82.89 56.43 L 83.27 56.13 L 83.54 55.78 L 83.67 55.33 L 83.67 54.78 L 83.52 54.11 ZM 52.06 54.13 L 51.77 54.19 L 51.55 54.34 L 51.33 54.69 L 51.13 55.21 L 50.97 55.66 L 50.99 56.04 L 51.33 56.41 L 51.86 56.80 L 52.30 56.99 L 52.76 56.93 L 53.35 56.63 L 53.84 56.26 L 54.00 55.88 L 53.95 55.39 L 53.78 54.75 L 53.64 54.37 L 53.38 54.22 L 53.00 54.17 L 52.49 54.11 ZM 61.22 55.30 L 60.92 55.29 L 60.65 55.38 L 60.41 55.55 L 60.24 55.78 L 60.17 56.05 L 60.20 56.36 L 60.33 56.70 L 60.52 56.95 L 60.74 57.11 L 60.99 57.20 L 61.27 57.20 L 61.70 57.12 L 61.99 57.02 L 62.17 56.81 L 62.28 56.41 L 62.28 56.04 L 62.16 55.80 L 61.92 55.61 L 61.57 55.41 ZM 66.66 61.29 L 67.55 61.04 L 68.35 60.61 L 69.05 60.02 L 69.62 59.24 L 70.02 58.27 L 70.27 57.11 L 70.35 55.77 L 66.10 56.70 L 65.23 56.93 L 64.53 57.20 L 63.97 57.50 L 63.57 57.84 L 63.08 58.58 L 62.92 59.36 L 62.97 59.75 L 63.10 60.11 L 63.31 60.45 L 63.61 60.77 L 64.00 61.03 L 64.47 61.22 L 65.02 61.34 L 65.66 61.38 ZM 52.05 58.60 L 51.55 58.55 L 51.12 58.56 L 50.75 58.64 L 50.43 58.80 L 50.11 59.08 L 49.78 59.45 L 49.46 59.94 L 49.25 60.38 L 49.19 60.75 L 49.27 61.06 L 49.50 61.31 L 50.14 61.83 L 50.83 62.53 L 51.30 63.01 L 51.72 63.30 L 52.21 63.35 L 52.85 63.17 L 53.83 62.76 L 54.61 62.50 L 54.91 62.34 L 55.11 62.06 L 55.21 61.64 L 55.22 61.09 L 55.12 60.50 L 54.97 60.02 L 54.76 59.66 L 54.50 59.41 L 54.17 59.21 L 53.74 59.03 L 53.23 58.87 L 52.63 58.72 ZM 59.91 60.03 L 59.55 60.02 L 59.17 60.09 L 58.92 60.19 L 58.80 60.38 L 58.75 60.74 L 58.82 61.01 L 59.00 61.13 L 59.28 61.18 L 59.61 61.30 L 59.99 61.33 L 60.27 61.09 L 60.41 60.74 L 60.27 60.38 ZM 76.39 60.16 L 75.89 60.74 L 76.03 61.38 L 76.75 61.38 L 77.11 60.66 ZM 73.49 62.50 L 73.22 62.39 L 72.88 62.41 L 72.42 62.45 L 72.14 62.56 L 71.91 62.72 L 71.74 62.96 L 71.64 63.25 L 71.65 63.53 L 71.75 63.77 L 71.94 64.00 L 72.22 64.19 L 72.66 64.24 L 73.14 63.97 L 73.48 63.66 L 73.72 63.44 L 73.83 63.19 L 73.72 62.81 ZM 82.80 63.35 L 82.65 63.14 L 82.44 62.99 L 82.14 62.89 L 81.82 62.83 L 81.56 62.81 L 81.36 62.90 L 81.14 63.17 L 80.98 63.52 L 80.94 63.84 L 81.02 64.14 L 81.22 64.41 L 81.49 64.58 L 81.72 64.55 L 82.30 64.25 L 82.72 64.00 L 82.84 63.85 L 82.88 63.61 ZM 62.53 64.41 L 62.13 64.08 L 61.60 63.94 L 60.85 63.89 L 60.51 63.91 L 60.24 63.98 L 60.03 64.09 L 59.89 64.25 L 59.69 64.70 L 59.47 65.34 L 59.27 66.11 L 59.11 66.67 L 59.10 66.91 L 59.19 67.15 L 59.38 67.39 L 59.69 67.64 L 59.99 67.81 L 60.26 67.87 L 60.49 67.83 L 60.69 67.69 L 61.12 67.26 L 61.71 66.78 L 62.28 66.33 L 62.74 66.02 L 62.90 65.86 L 62.97 65.65 L 62.96 65.38 L 62.86 65.05 ZM 77.50 64.19 L 77.14 64.08 L 76.79 64.15 L 76.39 64.41 L 76.07 64.75 L 75.89 65.12 L 75.85 65.54 L 75.96 65.99 L 76.19 66.37 L 76.46 66.53 L 76.82 66.56 L 77.33 66.56 L 77.66 66.48 L 77.92 66.33 L 78.13 66.09 L 78.27 65.77 L 78.31 65.32 L 78.33 64.99 L 78.23 64.71 L 77.91 64.47 ZM 65.08 66.56 L 64.52 67.06 L 64.30 67.86 L 65.24 67.78 L 65.88 67.14 ZM 55.68 66.71 L 55.46 66.69 L 55.29 66.74 L 55.17 66.86 L 54.96 67.18 L 54.64 67.56 L 54.39 67.89 L 54.21 68.14 L 54.15 68.40 L 54.28 68.72 L 54.50 69.00 L 54.77 69.17 L 55.11 69.25 L 55.52 69.22 L 56.01 69.09 L 56.36 68.97 L 56.60 68.74 L 56.74 68.28 L 56.75 67.72 L 56.66 67.33 L 56.41 67.03 L 55.94 66.78 ZM 75.74 68.00 L 75.02 68.72 L 75.46 69.52 L 76.10 69.16 L 76.17 68.72 ZM 81.86 68.80 L 81.50 69.22 L 82.00 69.58 L 82.52 69.22 L 82.52 68.58 Z' style='fill-rule: nonzero; fill: #000000; stroke-width: 0.75; stroke: none;' /> <path d='M 94.73 71.17 L 94.10 70.74 L 93.67 70.11 L 93.52 69.38 L 93.52 35.81 L 93.67 35.08 L 94.10 34.45 L 94.73 34.02 L 95.47 33.88 L 108.07 33.88 L 108.81 34.02 L 109.44 34.45 L 109.88 35.08 L 110.02 35.81 L 110.02 38.19 L 111.00 37.19 L 112.16 36.26 L 113.48 35.41 L 114.97 34.64 L 116.60 33.99 L 118.31 33.53 L 120.10 33.25 L 121.97 33.16 L 123.78 33.27 L 125.52 33.59 L 127.19 34.13 L 128.80 34.89 L 130.31 35.87 L 131.67 37.08 L 132.88 38.53 L 133.94 40.22 L 134.41 41.15 L 134.81 42.15 L 135.15 43.21 L 135.43 44.34 L 135.65 45.53 L 135.80 46.79 L 135.90 48.11 L 135.93 49.50 L 135.93 69.38 L 135.78 70.11 L 135.35 70.74 L 135.05 70.99 L 134.73 71.17 L 134.37 71.28 L 133.99 71.31 L 120.24 71.31 L 119.50 71.17 L 118.86 70.74 L 118.44 70.11 L 118.30 69.38 L 118.30 50.00 L 118.25 49.07 L 118.08 48.27 L 117.79 47.59 L 117.40 47.04 L 116.89 46.60 L 116.27 46.30 L 115.54 46.11 L 114.69 46.05 L 113.85 46.11 L 113.12 46.30 L 112.50 46.60 L 112.00 47.04 L 111.60 47.59 L 111.32 48.27 L 111.16 49.07 L 111.10 50.00 L 111.10 69.38 L 110.95 70.11 L 110.52 70.74 L 109.89 71.17 L 109.15 71.31 L 95.47 71.31 ZM 98.79 35.14 L 98.38 35.20 L 98.07 35.43 L 97.77 35.89 L 97.59 36.41 L 97.60 36.80 L 97.81 37.14 L 98.21 37.55 L 98.61 37.96 L 98.96 38.20 L 99.35 38.24 L 99.86 38.05 L 100.38 37.70 L 100.65 37.38 L 100.74 36.97 L 100.72 36.39 L 100.54 35.81 L 100.29 35.50 L 99.92 35.33 L 99.36 35.17 ZM 102.60 36.69 L 102.02 37.05 L 101.94 37.97 L 102.96 37.97 L 103.32 37.05 ZM 105.41 40.72 L 104.76 41.08 L 105.05 41.72 L 105.83 41.94 L 105.99 41.14 ZM 108.51 41.14 L 108.01 41.94 L 108.36 42.58 L 109.22 42.58 L 109.30 41.66 ZM 99.87 41.43 L 99.68 41.33 L 99.44 41.31 L 99.07 41.36 L 98.78 41.45 L 98.57 41.58 L 98.42 41.74 L 98.35 41.94 L 98.35 42.21 L 98.44 42.44 L 98.60 42.64 L 98.85 42.80 L 99.29 42.85 L 99.72 42.58 L 99.96 42.33 L 100.12 42.16 L 100.16 41.96 L 100.08 41.66 ZM 117.83 41.34 L 117.46 41.28 L 117.15 41.39 L 116.85 41.72 L 116.56 42.24 L 116.41 42.63 L 116.47 43.01 L 116.79 43.52 L 117.13 43.82 L 117.46 43.84 L 117.85 43.71 L 118.36 43.52 L 118.71 43.34 L 118.96 43.11 L 119.11 42.80 L 119.16 42.44 L 119.11 42.11 L 118.95 41.84 L 118.68 41.64 L 118.30 41.50 ZM 112.03 41.70 L 111.60 41.63 L 111.22 41.74 L 110.80 42.16 L 110.52 42.61 L 110.52 42.99 L 110.72 43.36 L 111.02 43.88 L 111.31 44.29 L 111.60 44.50 L 111.95 44.54 L 112.46 44.45 L 112.95 44.29 L 113.29 44.09 L 113.47 43.77 L 113.54 43.24 L 113.49 42.68 L 113.35 42.33 L 113.08 42.09 L 112.60 41.86 ZM 127.22 43.09 L 126.71 42.74 L 126.47 42.63 L 126.19 42.62 L 125.86 42.70 L 125.49 42.88 L 125.12 43.13 L 124.86 43.38 L 124.72 43.64 L 124.69 43.89 L 124.81 44.50 L 124.99 45.33 L 125.13 46.17 L 125.27 46.83 L 125.40 47.09 L 125.62 47.30 L 125.95 47.45 L 126.36 47.55 L 126.82 47.55 L 127.20 47.49 L 127.49 47.36 L 127.69 47.16 L 128.04 46.57 L 128.44 45.75 L 128.56 45.38 L 128.63 45.06 L 128.63 44.78 L 128.58 44.56 L 128.33 44.13 L 127.86 43.59 ZM 106.88 43.72 L 106.60 43.28 L 106.45 43.12 L 106.23 43.03 L 105.96 43.02 L 105.63 43.09 L 105.30 43.18 L 105.06 43.31 L 104.90 43.47 L 104.82 43.67 L 104.76 44.19 L 104.69 44.89 L 104.72 45.36 L 104.89 45.75 L 105.19 46.08 L 105.63 46.33 L 106.06 46.38 L 106.33 46.27 L 106.59 45.99 L 106.91 45.61 L 107.17 45.28 L 107.35 45.00 L 107.38 44.68 L 107.21 44.24 ZM 99.02 45.56 L 98.67 45.63 L 98.38 45.81 L 98.13 46.11 L 97.83 46.63 L 97.63 47.02 L 97.61 47.39 L 97.85 47.84 L 98.26 48.18 L 98.63 48.20 L 99.05 48.04 L 99.57 47.84 L 99.91 47.63 L 100.07 47.41 L 100.13 47.12 L 100.15 46.69 L 100.11 46.26 L 100.01 45.97 L 99.79 45.77 L 99.43 45.61 ZM 133.13 48.63 L 132.47 48.92 L 132.83 49.56 L 133.55 49.56 L 133.55 48.84 ZM 95.76 49.20 L 95.26 49.78 L 95.54 50.28 L 96.19 50.44 L 96.33 49.72 ZM 107.74 50.58 L 107.56 50.31 L 107.31 50.11 L 106.99 50.00 L 106.62 49.98 L 106.30 50.08 L 106.04 50.27 L 105.83 50.58 L 105.71 50.93 L 105.69 51.24 L 105.79 51.53 L 105.99 51.80 L 106.31 52.06 L 106.55 52.13 L 106.83 52.06 L 107.21 51.94 L 107.52 51.74 L 107.74 51.56 L 107.86 51.31 L 107.85 50.94 ZM 103.55 52.57 L 103.21 52.20 L 102.90 51.92 L 102.60 51.74 L 102.27 51.63 L 101.86 51.60 L 101.37 51.66 L 100.80 51.80 L 100.29 51.97 L 99.92 52.18 L 99.67 52.45 L 99.57 52.75 L 99.50 53.52 L 99.43 54.53 L 99.40 55.56 L 99.43 56.34 L 99.53 56.65 L 99.76 56.93 L 100.11 57.19 L 100.58 57.42 L 101.05 57.54 L 101.43 57.55 L 101.74 57.43 L 101.97 57.20 L 102.44 56.56 L 103.04 55.77 L 103.67 54.99 L 104.15 54.42 L 104.28 54.16 L 104.28 53.84 L 104.15 53.47 L 103.90 53.03 ZM 127.24 52.64 L 126.68 52.27 L 126.17 52.00 L 125.71 51.81 L 125.25 51.75 L 124.74 51.84 L 124.17 52.10 L 123.55 52.52 L 122.82 53.08 L 122.20 53.60 L 121.69 54.09 L 121.29 54.55 L 121.02 55.04 L 120.92 55.66 L 120.97 56.40 L 121.18 57.27 L 121.52 58.15 L 121.91 58.84 L 122.34 59.34 L 122.82 59.66 L 123.39 59.84 L 124.10 59.97 L 124.95 60.05 L 125.93 60.08 L 126.72 60.03 L 127.35 59.88 L 127.83 59.64 L 128.16 59.30 L 128.42 58.85 L 128.68 58.29 L 128.96 57.62 L 129.24 56.84 L 129.41 56.17 L 129.49 55.59 L 129.48 55.09 L 129.38 54.69 L 129.18 54.31 L 128.86 53.92 L 128.42 53.52 L 127.86 53.09 ZM 107.28 57.83 L 106.99 58.14 L 106.58 58.72 L 106.53 58.95 L 106.71 59.22 L 107.00 59.45 L 107.33 59.54 L 107.68 59.52 L 108.07 59.36 L 108.34 59.19 L 108.43 58.97 L 108.41 58.67 L 108.36 58.28 L 108.16 57.98 L 107.71 57.78 ZM 133.63 60.02 L 133.27 60.74 L 133.49 61.38 L 134.13 61.02 L 134.57 60.22 ZM 106.99 61.09 L 106.99 61.95 L 107.49 62.39 L 108.07 61.95 L 107.85 61.30 ZM 99.15 61.59 L 98.79 62.24 L 99.21 62.67 L 100.01 62.67 L 99.79 61.95 ZM 126.36 62.81 L 125.79 63.39 L 126.15 63.89 L 126.57 63.83 L 126.79 63.39 ZM 100.49 64.54 L 100.12 64.33 L 99.72 64.29 L 99.21 64.47 L 98.79 64.83 L 98.65 65.19 L 98.68 65.62 L 98.79 66.20 L 98.99 66.77 L 99.15 67.17 L 99.41 67.43 L 99.93 67.56 L 100.56 67.57 L 100.97 67.42 L 101.29 67.10 L 101.58 66.56 L 101.75 66.05 L 101.66 65.67 L 101.37 65.32 L 100.94 64.91 ZM 107.92 65.44 L 107.55 65.32 L 107.16 65.34 L 106.77 65.49 L 106.19 65.86 L 105.72 66.13 L 105.57 66.27 L 105.49 66.47 L 105.48 66.74 L 105.55 67.06 L 105.73 67.81 L 105.97 68.33 L 106.16 68.51 L 106.43 68.64 L 106.78 68.74 L 107.21 68.80 L 107.57 68.80 L 107.86 68.74 L 108.07 68.62 L 108.21 68.44 L 108.45 67.91 L 108.72 67.20 L 108.90 66.70 L 108.86 66.34 L 108.65 66.04 L 108.29 65.70 ZM 124.11 66.25 L 123.84 66.26 L 123.61 66.36 L 123.41 66.56 L 123.28 66.79 L 123.25 67.03 L 123.31 67.29 L 123.47 67.56 L 123.70 67.84 L 123.93 68.00 L 124.17 68.06 L 124.41 68.00 L 124.63 67.89 L 124.79 67.72 L 124.88 67.47 L 124.91 67.14 L 124.88 66.85 L 124.79 66.61 L 124.63 66.45 L 124.41 66.34 ZM 133.37 66.56 L 132.99 66.56 L 132.67 66.76 L 132.41 66.92 L 132.26 67.14 L 132.26 67.50 L 132.38 67.79 L 132.58 67.94 L 133.19 68.08 L 133.53 68.08 L 133.81 68.02 L 134.03 67.90 L 134.21 67.72 L 134.28 67.42 L 134.21 67.22 L 134.03 67.05 L 133.77 66.84 ZM 96.47 67.36 L 95.83 67.92 L 96.26 68.86 L 97.27 68.44 L 97.41 67.42 Z' style='fill-rule: nonzero; fill: #000000; stroke-width: 0.75; stroke: none;' /> <path d='M 158.46 71.91 L 155.91 71.56 L 153.49 70.96 L 151.19 70.13 L 150.10 69.62 L 149.07 69.06 L 148.09 68.45 L 147.17 67.78 L 146.30 67.05 L 145.48 66.27 L 144.71 65.44 L 144.00 64.55 L 143.36 63.60 L 142.79 62.61 L 142.30 61.56 L 141.88 60.46 L 141.54 59.31 L 141.27 58.11 L 141.08 56.85 L 140.97 55.55 L 140.92 54.25 L 140.91 52.67 L 140.92 51.07 L 140.97 49.72 L 141.08 48.41 L 141.27 47.16 L 141.53 45.95 L 141.87 44.80 L 142.28 43.69 L 142.76 42.64 L 143.32 41.64 L 143.96 40.69 L 144.66 39.79 L 145.42 38.94 L 146.24 38.15 L 147.11 37.42 L 148.04 36.75 L 149.02 36.13 L 150.06 35.57 L 151.16 35.06 L 153.47 34.23 L 155.90 33.63 L 158.45 33.28 L 161.13 33.16 L 162.53 33.18 L 163.87 33.26 L 165.16 33.39 L 166.40 33.56 L 167.59 33.79 L 168.72 34.07 L 169.79 34.40 L 170.82 34.78 L 172.72 35.63 L 174.44 36.58 L 175.96 37.61 L 177.30 38.74 L 178.46 39.91 L 179.46 41.07 L 180.30 42.23 L 180.97 43.39 L 181.50 44.49 L 181.88 45.47 L 182.13 46.35 L 182.24 47.13 L 182.23 47.50 L 182.13 47.86 L 181.95 48.18 L 181.69 48.49 L 181.03 48.92 L 180.28 49.06 L 166.25 49.06 L 165.54 48.97 L 164.99 48.67 L 164.55 48.21 L 164.16 47.63 L 163.65 46.72 L 163.11 46.05 L 162.81 45.80 L 162.44 45.61 L 162.01 45.51 L 161.50 45.47 L 160.74 45.55 L 160.12 45.77 L 159.63 46.16 L 159.27 46.69 L 159.01 47.36 L 158.82 48.15 L 158.68 49.05 L 158.61 50.08 L 158.59 51.56 L 158.58 52.90 L 158.59 54.11 L 158.61 55.19 L 158.71 56.26 L 158.85 57.19 L 159.05 57.97 L 159.30 58.61 L 159.64 59.10 L 160.12 59.44 L 160.74 59.65 L 161.50 59.72 L 162.09 59.68 L 162.58 59.58 L 162.97 59.39 L 163.25 59.14 L 163.71 58.46 L 164.16 57.56 L 164.51 56.93 L 164.97 56.49 L 165.55 56.22 L 166.25 56.13 L 180.28 56.13 L 181.03 56.27 L 181.69 56.70 L 181.95 57.00 L 182.13 57.33 L 182.23 57.68 L 182.24 58.06 L 182.16 58.58 L 182.01 59.19 L 181.78 59.90 L 181.47 60.70 L 181.08 61.56 L 180.59 62.45 L 180.00 63.38 L 179.32 64.33 L 178.52 65.29 L 177.60 66.22 L 176.56 67.14 L 175.39 68.03 L 174.10 68.88 L 172.67 69.64 L 171.11 70.32 L 169.41 70.92 L 167.57 71.41 L 165.58 71.76 L 163.43 71.96 L 161.13 72.03 ZM 164.42 34.61 L 163.97 34.64 L 163.52 34.88 L 162.94 35.24 L 162.62 35.59 L 162.53 35.92 L 162.62 36.31 L 162.78 36.83 L 162.95 37.26 L 163.14 37.55 L 163.44 37.71 L 163.88 37.77 L 164.51 37.75 L 164.96 37.69 L 165.28 37.47 L 165.53 36.97 L 165.68 36.27 L 165.71 35.75 L 165.50 35.31 L 164.96 34.88 ZM 159.73 35.10 L 159.39 34.75 L 159.23 34.64 L 159.02 34.60 L 158.77 34.63 L 158.47 34.74 L 158.19 34.86 L 157.99 35.00 L 157.87 35.17 L 157.83 35.36 L 157.83 35.83 L 157.83 36.47 L 157.89 36.89 L 158.08 37.16 L 158.39 37.31 L 158.83 37.41 L 159.40 37.59 L 159.83 37.72 L 160.18 37.68 L 160.49 37.33 L 160.72 36.82 L 160.71 36.44 L 160.49 36.06 L 160.13 35.59 ZM 149.39 38.36 L 149.08 38.21 L 148.74 38.18 L 148.39 38.27 L 148.12 38.49 L 148.03 38.74 L 148.05 39.03 L 148.10 39.42 L 148.16 39.79 L 148.21 40.06 L 148.35 40.26 L 148.67 40.42 L 149.08 40.44 L 149.42 40.37 L 149.70 40.19 L 149.91 39.92 L 150.03 39.60 L 150.03 39.27 L 149.92 38.95 L 149.69 38.63 ZM 153.31 38.36 L 153.03 38.38 L 152.76 38.53 L 152.43 38.77 L 152.22 38.97 L 152.12 39.20 L 152.11 39.47 L 152.21 39.78 L 152.34 40.15 L 152.46 40.39 L 152.65 40.52 L 153.00 40.56 L 153.44 40.54 L 153.75 40.47 L 153.98 40.26 L 154.16 39.84 L 154.23 39.47 L 154.17 39.13 L 153.97 38.82 L 153.64 38.55 ZM 174.45 39.65 L 174.21 39.28 L 173.84 39.04 L 173.24 38.91 L 172.98 38.90 L 172.78 38.96 L 172.65 39.07 L 172.58 39.24 L 172.46 39.67 L 172.22 40.20 L 172.13 40.52 L 172.14 40.80 L 172.24 41.06 L 172.44 41.28 L 172.67 41.45 L 172.93 41.50 L 173.21 41.45 L 173.52 41.28 L 173.98 41.04 L 174.38 40.89 L 174.53 40.81 L 174.63 40.67 L 174.68 40.46 L 174.67 40.20 ZM 157.75 39.27 L 157.83 40.20 L 158.61 40.56 L 158.91 39.84 L 158.69 39.06 ZM 168.66 40.66 L 168.47 40.86 L 168.37 41.18 L 168.19 41.58 L 168.14 42.00 L 168.41 42.38 L 168.72 42.71 L 168.94 42.99 L 169.19 43.12 L 169.57 43.02 L 169.85 42.74 L 169.88 42.45 L 169.71 41.72 L 169.59 41.28 L 169.52 40.97 L 169.39 40.76 L 169.05 40.64 ZM 161.78 41.66 L 161.86 42.58 L 162.58 43.24 L 163.08 42.38 L 162.72 41.80 ZM 145.58 41.77 L 145.41 41.91 L 145.29 42.13 L 145.16 42.44 L 145.02 42.76 L 144.92 43.02 L 144.95 43.25 L 145.16 43.52 L 145.49 43.76 L 145.77 43.89 L 146.05 43.88 L 146.38 43.74 L 146.79 43.46 L 147.02 43.20 L 147.10 42.88 L 147.02 42.44 L 146.90 42.05 L 146.66 41.88 L 146.32 41.80 L 145.88 41.72 ZM 151.40 47.73 L 151.28 47.05 L 151.18 46.44 L 151.13 45.91 L 150.99 45.45 L 150.66 45.09 L 150.12 44.83 L 149.39 44.67 L 148.56 44.65 L 147.87 44.73 L 147.30 44.91 L 146.88 45.19 L 146.52 45.57 L 146.16 46.08 L 145.80 46.72 L 145.44 47.49 L 145.14 48.23 L 144.96 48.89 L 144.89 49.46 L 144.94 49.94 L 145.12 50.38 L 145.42 50.85 L 145.87 51.35 L 146.44 51.88 L 146.97 52.33 L 147.47 52.63 L 147.93 52.76 L 148.36 52.74 L 148.81 52.59 L 149.32 52.38 L 149.90 52.09 L 150.55 51.74 L 151.03 51.41 L 151.39 51.08 L 151.62 50.76 L 151.74 50.44 L 151.77 50.07 L 151.75 49.62 L 151.68 49.09 L 151.57 48.49 ZM 166.03 45.75 L 165.38 45.83 L 165.53 46.55 L 166.25 46.77 L 166.61 46.05 ZM 174.24 46.19 L 173.80 46.91 L 174.60 47.19 L 175.32 46.83 L 174.89 46.19 ZM 169.35 46.19 L 169.41 46.91 L 169.85 47.27 L 170.71 46.97 L 170.13 46.33 ZM 153.44 56.08 L 153.21 56.05 L 152.94 56.12 L 152.57 56.19 L 151.96 56.42 L 151.77 56.56 L 151.71 56.84 L 151.76 57.18 L 151.91 57.34 L 152.17 57.44 L 152.50 57.56 L 152.87 57.70 L 153.14 57.81 L 153.38 57.81 L 153.64 57.56 L 153.84 57.27 L 153.92 56.97 L 153.88 56.66 L 153.72 56.34 ZM 146.16 56.13 L 145.88 56.70 L 146.44 56.99 L 147.10 56.63 L 146.66 56.13 ZM 144.60 57.31 L 144.50 57.06 L 144.35 56.90 L 144.07 56.84 L 143.77 56.89 L 143.53 57.02 L 143.34 57.25 L 143.21 57.56 L 143.14 57.87 L 143.21 58.06 L 143.39 58.22 L 143.64 58.42 L 143.92 58.60 L 144.17 58.68 L 144.39 58.67 L 144.58 58.56 L 144.78 58.37 L 144.82 58.19 L 144.78 57.95 L 144.72 57.63 ZM 167.31 58.78 L 167.10 58.71 L 166.93 58.71 L 166.78 58.78 L 166.47 59.04 L 166.03 59.36 L 165.65 59.63 L 165.38 59.84 L 165.26 60.11 L 165.32 60.58 L 165.49 60.97 L 165.74 61.13 L 166.10 61.15 L 166.61 61.16 L 166.94 61.12 L 167.21 61.00 L 167.41 60.80 L 167.55 60.52 L 167.73 60.00 L 167.86 59.61 L 167.84 59.27 L 167.55 58.92 ZM 157.96 59.54 L 157.67 59.28 L 157.37 59.17 L 157.07 59.19 L 156.34 59.39 L 155.38 59.66 L 154.33 59.83 L 153.50 59.94 L 153.18 60.04 L 152.93 60.28 L 152.75 60.65 L 152.64 61.16 L 152.60 61.83 L 152.63 62.40 L 152.71 62.87 L 152.86 63.25 L 153.10 63.56 L 153.46 63.86 L 153.93 64.14 L 154.52 64.41 L 155.12 64.64 L 155.66 64.78 L 156.12 64.82 L 156.52 64.77 L 156.89 64.60 L 157.28 64.33 L 157.69 63.96 L 158.11 63.47 L 158.49 62.98 L 158.76 62.54 L 158.92 62.14 L 158.97 61.78 L 158.93 61.41 L 158.79 60.98 L 158.57 60.49 L 158.25 59.94 ZM 174.10 59.43 L 173.75 59.31 L 173.41 59.32 L 173.08 59.44 L 172.71 59.71 L 172.47 60.03 L 172.35 60.39 L 172.36 60.80 L 172.50 61.44 L 172.61 61.92 L 172.71 62.10 L 172.87 62.24 L 173.12 62.33 L 173.44 62.39 L 173.77 62.40 L 174.04 62.35 L 174.24 62.25 L 174.38 62.09 L 174.62 61.67 L 174.89 61.09 L 175.06 60.64 L 174.99 60.30 L 174.77 60.00 L 174.46 59.66 ZM 148.46 59.66 L 148.10 60.16 L 148.32 60.80 L 149.11 60.66 L 148.89 60.08 ZM 170.23 61.74 L 169.99 61.75 L 169.73 61.85 L 169.41 62.03 L 169.23 62.16 L 169.13 62.33 L 169.09 62.55 L 169.13 62.81 L 169.30 63.39 L 169.47 63.56 L 169.77 63.61 L 170.07 63.60 L 170.33 63.51 L 170.54 63.31 L 170.71 63.03 L 170.80 62.69 L 170.80 62.38 L 170.70 62.11 L 170.50 61.88 ZM 147.80 62.92 L 147.49 62.78 L 147.12 62.81 L 146.66 62.95 L 146.32 63.13 L 146.13 63.33 L 146.05 63.61 L 146.02 64.05 L 146.06 64.34 L 146.20 64.58 L 146.42 64.77 L 146.74 64.91 L 147.04 64.98 L 147.24 64.91 L 147.40 64.73 L 147.60 64.47 L 147.88 64.10 L 148.14 63.83 L 148.26 63.58 L 148.10 63.25 ZM 151.71 64.19 L 151.21 64.91 L 151.71 65.55 L 152.35 65.27 L 152.57 64.47 ZM 164.38 65.84 L 163.80 66.63 L 164.66 67.14 L 165.38 66.78 L 165.10 66.13 ZM 168.59 66.54 L 168.28 66.49 L 167.98 66.54 L 167.69 66.70 L 167.34 67.03 L 167.14 67.28 L 167.10 67.57 L 167.19 68.00 L 167.40 68.43 L 167.61 68.72 L 167.92 68.88 L 168.41 68.94 L 168.82 68.87 L 169.05 68.66 L 169.20 68.31 L 169.35 67.86 L 169.40 67.54 L 169.35 67.24 L 169.18 66.96 L 168.91 66.70 ZM 158.99 67.28 L 158.75 66.92 L 158.62 66.80 L 158.44 66.72 L 158.19 66.69 L 157.89 66.70 L 157.64 66.76 L 157.46 66.85 L 157.35 66.98 L 157.32 67.14 L 157.28 67.56 L 157.18 68.08 L 157.09 68.53 L 156.99 68.88 L 157.02 69.16 L 157.32 69.44 L 157.76 69.63 L 158.08 69.63 L 158.37 69.45 L 158.75 69.16 L 159.12 68.83 L 159.33 68.55 L 159.39 68.22 L 159.27 67.78 ZM 165.67 68.80 L 165.24 69.02 L 165.17 69.66 L 165.82 69.58 L 166.25 69.16 Z' style='fill-rule: nonzero; fill: #000000; stroke-width: 0.75; stroke: none;' /> <path d='M 192.15 84.89 L 191.64 84.56 L 191.28 84.08 L 191.17 83.49 L 191.17 83.16 L 191.23 82.77 L 197.00 68.94 L 183.10 36.11 L 183.03 35.59 L 183.25 34.93 L 183.64 34.38 L 184.17 34.00 L 184.82 33.88 L 197.00 33.88 L 197.50 33.91 L 197.93 34.02 L 198.29 34.20 L 198.57 34.45 L 199.00 35.03 L 199.29 35.59 L 205.20 52.02 L 211.32 35.59 L 211.67 34.98 L 212.14 34.42 L 212.44 34.18 L 212.82 34.01 L 213.26 33.91 L 213.78 33.88 L 225.79 33.88 L 226.41 33.99 L 226.95 34.34 L 227.32 34.84 L 227.45 35.39 L 227.43 35.78 L 227.37 36.11 L 207.07 83.27 L 206.72 83.88 L 206.21 84.45 L 205.89 84.69 L 205.51 84.86 L 205.07 84.97 L 204.56 85.00 L 192.75 85.00 ZM 190.28 38.08 L 190.01 38.04 L 189.79 38.07 L 189.60 38.16 L 189.26 38.47 L 188.85 38.91 L 188.51 39.38 L 188.32 39.78 L 188.33 40.20 L 188.57 40.72 L 188.93 41.15 L 189.29 41.33 L 189.72 41.30 L 190.29 41.14 L 190.85 40.98 L 191.23 40.78 L 191.47 40.44 L 191.59 39.84 L 191.59 39.22 L 191.45 38.77 L 191.12 38.44 L 190.59 38.19 ZM 196.34 39.70 L 196.28 40.36 L 197.06 40.56 L 197.06 39.84 L 196.85 39.20 ZM 220.26 40.47 L 219.85 40.03 L 219.63 39.90 L 219.34 39.82 L 219.00 39.80 L 218.59 39.84 L 217.60 40.08 L 216.93 40.36 L 216.70 40.55 L 216.50 40.84 L 216.34 41.23 L 216.21 41.72 L 216.18 42.22 L 216.21 42.65 L 216.30 42.99 L 216.46 43.27 L 217.04 43.74 L 217.95 44.24 L 218.92 44.67 L 219.67 44.94 L 220.01 44.96 L 220.37 44.85 L 220.77 44.61 L 221.18 44.24 L 221.51 43.82 L 221.68 43.45 L 221.70 43.10 L 221.57 42.80 L 221.16 42.09 L 220.68 41.14 ZM 193.18 40.92 L 192.53 41.22 L 192.82 41.94 L 193.46 41.80 L 193.75 41.22 ZM 198.20 42.22 L 197.80 42.27 L 197.45 42.44 L 197.14 42.74 L 196.98 43.11 L 197.06 43.38 L 197.30 43.63 L 197.57 43.95 L 197.82 44.34 L 198.00 44.64 L 198.24 44.79 L 198.65 44.75 L 198.96 44.56 L 199.04 44.31 L 199.02 43.97 L 199.01 43.52 L 199.02 43.07 L 199.04 42.74 L 198.96 42.48 L 198.65 42.30 ZM 188.28 43.24 L 188.43 43.88 L 188.93 44.31 L 189.21 43.74 L 189.07 43.09 ZM 192.73 43.45 L 192.34 43.20 L 192.18 43.12 L 192.01 43.13 L 191.84 43.25 L 191.67 43.45 L 191.51 43.84 L 191.65 44.14 L 191.95 44.45 L 192.25 44.89 L 192.50 45.27 L 192.70 45.55 L 192.95 45.67 L 193.32 45.61 L 193.71 45.43 L 193.89 45.19 L 193.94 44.84 L 193.96 44.39 L 193.92 44.09 L 193.78 43.89 L 193.56 43.76 L 193.25 43.67 ZM 211.04 46.08 L 210.55 46.06 L 210.15 46.11 L 209.84 46.25 L 209.28 46.77 L 208.65 47.63 L 208.46 48.00 L 208.38 48.33 L 208.42 48.62 L 208.57 48.86 L 209.07 49.36 L 209.67 50.00 L 210.13 50.49 L 210.53 50.80 L 210.73 50.88 L 210.98 50.89 L 211.27 50.84 L 211.60 50.72 L 212.37 50.38 L 212.90 50.08 L 213.09 49.89 L 213.21 49.62 L 213.27 49.27 L 213.26 48.84 L 213.15 47.84 L 212.93 47.10 L 212.76 46.80 L 212.48 46.56 L 212.09 46.35 L 211.60 46.19 ZM 196.31 48.06 L 196.11 47.86 L 195.84 47.74 L 195.48 47.70 L 195.03 47.70 L 194.68 47.70 L 194.42 47.83 L 194.18 48.20 L 194.08 48.62 L 194.11 49.01 L 194.29 49.38 L 194.62 49.72 L 195.13 50.06 L 195.51 50.25 L 195.90 50.24 L 196.42 50.00 L 196.76 49.64 L 196.78 49.28 L 196.62 48.87 L 196.42 48.34 ZM 191.38 47.71 L 191.09 47.81 L 190.85 48.07 L 190.59 48.49 L 190.27 48.90 L 190.04 49.25 L 190.00 49.60 L 190.23 50.00 L 190.61 50.35 L 191.02 50.56 L 191.47 50.60 L 191.95 50.50 L 192.48 50.26 L 192.75 49.97 L 192.85 49.56 L 192.89 49.00 L 192.83 48.51 L 192.67 48.17 L 192.35 47.94 L 191.81 47.77 ZM 217.16 49.53 L 216.94 49.61 L 216.81 49.75 L 216.75 49.94 L 216.67 50.40 L 216.51 50.94 L 216.42 51.25 L 216.43 51.53 L 216.56 51.79 L 216.79 52.02 L 217.12 52.33 L 217.40 52.56 L 217.70 52.62 L 218.09 52.45 L 218.48 52.11 L 218.65 51.81 L 218.67 51.45 L 218.59 50.94 L 218.45 50.36 L 218.31 49.92 L 218.20 49.76 L 218.02 49.64 L 217.77 49.55 L 217.45 49.50 ZM 212.02 53.86 L 211.67 53.89 L 211.37 54.09 L 211.03 54.39 L 210.76 54.74 L 210.70 55.05 L 210.80 55.37 L 211.03 55.77 L 211.23 56.13 L 211.42 56.38 L 211.69 56.49 L 212.12 56.49 L 212.53 56.33 L 212.76 56.13 L 212.89 55.84 L 212.98 55.41 L 213.02 54.90 L 213.01 54.55 L 212.86 54.27 L 212.48 54.03 ZM 201.46 55.54 L 200.76 55.45 L 200.15 55.41 L 199.62 55.44 L 199.15 55.58 L 198.69 55.90 L 198.26 56.39 L 197.85 57.06 L 197.37 57.95 L 196.98 58.74 L 196.70 59.44 L 196.53 60.05 L 196.51 60.63 L 196.72 61.28 L 197.14 61.98 L 197.79 62.75 L 198.46 63.38 L 199.09 63.78 L 199.69 63.94 L 200.26 63.86 L 200.87 63.64 L 201.55 63.35 L 202.33 63.00 L 203.18 62.59 L 203.98 62.21 L 204.65 61.84 L 205.19 61.48 L 205.59 61.13 L 205.88 60.71 L 206.05 60.14 L 206.11 59.43 L 206.06 58.56 L 205.91 57.79 L 205.68 57.18 L 205.36 56.73 L 204.95 56.45 L 204.44 56.25 L 203.82 56.06 L 203.09 55.87 L 202.25 55.69 ZM 211.90 60.88 L 211.82 61.38 L 212.32 61.74 L 212.90 61.24 L 212.40 60.80 ZM 208.89 62.45 L 208.65 62.28 L 208.41 62.24 L 208.07 62.31 L 207.64 62.51 L 207.32 62.67 L 207.14 62.92 L 207.07 63.39 L 207.12 63.95 L 207.25 64.33 L 207.55 64.58 L 208.07 64.77 L 208.56 64.82 L 208.87 64.69 L 209.10 64.40 L 209.37 63.97 L 209.57 63.59 L 209.56 63.33 L 209.41 63.08 L 209.17 62.75 ZM 210.98 67.32 L 210.67 67.31 L 210.35 67.45 L 209.95 67.64 L 209.73 67.81 L 209.59 68.04 L 209.52 68.32 L 209.53 68.66 L 209.62 68.91 L 209.78 69.12 L 209.98 69.27 L 210.25 69.38 L 210.54 69.37 L 210.80 69.28 L 211.01 69.11 L 211.18 68.86 L 211.37 68.48 L 211.53 68.19 L 211.55 67.90 L 211.32 67.56 ZM 206.61 69.29 L 206.35 69.34 L 206.17 69.45 L 206.06 69.63 L 205.86 70.06 L 205.56 70.59 L 205.41 70.95 L 205.38 71.29 L 205.48 71.63 L 205.70 71.95 L 206.00 72.19 L 206.33 72.32 L 206.69 72.34 L 207.07 72.25 L 207.52 71.98 L 207.85 71.77 L 207.98 71.65 L 208.06 71.48 L 208.09 71.25 L 208.07 70.95 L 207.98 70.32 L 207.82 69.84 L 207.70 69.66 L 207.51 69.51 L 207.26 69.39 L 206.93 69.30 ZM 202.69 73.15 L 202.43 72.83 L 202.08 72.67 L 201.53 72.61 L 200.99 72.73 L 200.67 72.94 L 200.47 73.29 L 200.31 73.83 L 200.17 74.34 L 200.20 74.74 L 200.41 75.07 L 200.81 75.42 L 201.24 75.64 L 201.67 75.69 L 202.11 75.60 L 202.54 75.34 L 202.86 74.97 L 203.04 74.57 L 203.07 74.14 L 202.96 73.69 ZM 206.45 75.62 L 206.19 75.34 L 205.84 75.18 L 205.42 75.13 L 205.03 75.21 L 204.87 75.45 L 204.81 75.81 L 204.70 76.20 L 204.70 76.66 L 204.98 77.00 L 205.42 77.14 L 205.85 77.00 L 206.19 76.75 L 206.50 76.56 L 206.68 76.35 L 206.64 76.00 ZM 198.67 77.61 L 198.29 77.20 L 198.08 77.09 L 197.82 77.05 L 197.51 77.06 L 197.14 77.14 L 196.26 77.38 L 195.62 77.64 L 195.40 77.83 L 195.23 78.11 L 195.11 78.48 L 195.04 78.94 L 195.04 79.44 L 195.10 79.85 L 195.22 80.17 L 195.40 80.39 L 195.98 80.77 L 196.85 81.17 L 197.75 81.52 L 198.43 81.67 L 198.73 81.65 L 199.05 81.52 L 199.38 81.26 L 199.73 80.89 L 199.99 80.51 L 200.12 80.17 L 200.13 79.86 L 200.01 79.59 L 199.60 79.00 L 199.07 78.22 Z' style='fill-rule: nonzero; fill: #000000; stroke-width: 0.75; stroke: none;' /> <path d='M 246.44 71.17 L 245.80 70.74 L 245.37 70.11 L 245.22 69.38 L 245.22 22.86 L 245.37 22.12 L 245.80 21.49 L 246.44 21.05 L 247.18 20.91 L 283.46 20.91 L 284.20 21.05 L 284.83 21.49 L 285.27 22.12 L 285.41 22.86 L 285.41 33.80 L 285.27 34.54 L 284.83 35.17 L 284.20 35.61 L 283.46 35.75 L 262.87 35.75 L 262.87 41.08 L 282.02 41.08 L 282.76 41.22 L 283.40 41.66 L 283.65 41.95 L 283.83 42.28 L 283.94 42.63 L 283.97 43.02 L 283.97 53.89 L 283.83 54.63 L 283.40 55.27 L 282.76 55.69 L 282.02 55.83 L 262.87 55.83 L 262.87 69.38 L 262.72 70.11 L 262.29 70.74 L 261.99 70.99 L 261.67 71.17 L 261.31 71.28 L 260.93 71.31 L 247.18 71.31 ZM 265.87 24.25 L 265.34 24.25 L 264.89 24.29 L 264.52 24.38 L 264.20 24.54 L 263.90 24.82 L 263.60 25.22 L 263.30 25.74 L 263.07 26.28 L 262.94 26.77 L 262.92 27.19 L 263.01 27.55 L 263.19 27.88 L 263.46 28.24 L 263.81 28.63 L 264.24 29.05 L 264.72 29.48 L 265.15 29.84 L 265.54 30.13 L 265.88 30.34 L 266.24 30.46 L 266.68 30.45 L 267.19 30.32 L 267.77 30.06 L 268.35 29.70 L 268.78 29.34 L 269.07 28.99 L 269.21 28.63 L 269.25 28.22 L 269.24 27.72 L 269.18 27.13 L 269.07 26.45 L 268.94 25.90 L 268.76 25.44 L 268.54 25.09 L 268.27 24.84 L 267.94 24.66 L 267.53 24.51 L 267.04 24.39 L 266.47 24.30 ZM 249.55 25.87 L 249.55 25.53 L 249.43 25.27 L 249.05 25.09 L 248.61 25.01 L 248.22 25.07 L 247.89 25.26 L 247.61 25.59 L 247.31 26.05 L 247.16 26.42 L 247.20 26.79 L 247.46 27.25 L 247.84 27.60 L 248.25 27.81 L 248.70 27.85 L 249.19 27.75 L 249.55 27.50 L 249.62 27.17 L 249.57 26.78 L 249.55 26.31 ZM 280.29 26.01 L 280.15 26.17 L 279.93 26.67 L 279.65 27.25 L 279.61 27.49 L 279.79 27.75 L 280.06 27.95 L 280.29 27.94 L 280.56 27.82 L 280.94 27.69 L 281.36 27.24 L 281.44 26.75 L 281.34 26.43 L 281.16 26.19 L 280.91 26.03 L 280.58 25.95 ZM 274.88 26.60 L 274.76 26.38 L 274.61 26.24 L 274.43 26.17 L 273.95 26.10 L 273.32 26.03 L 272.81 26.01 L 272.44 26.09 L 272.16 26.34 L 271.87 26.81 L 271.75 27.27 L 271.80 27.61 L 272.02 27.93 L 272.38 28.33 L 272.89 28.72 L 273.27 29.02 L 273.45 29.10 L 273.68 29.11 L 273.94 29.05 L 274.24 28.91 L 274.77 28.48 L 275.04 28.08 L 275.09 27.59 L 274.97 26.89 ZM 253.34 26.99 L 253.04 26.81 L 252.76 26.80 L 252.43 27.03 L 252.07 27.38 L 251.85 27.69 L 251.82 28.04 L 252.01 28.55 L 252.24 28.92 L 252.54 29.05 L 252.91 29.03 L 253.36 28.97 L 253.79 28.90 L 254.08 28.80 L 254.28 28.59 L 254.44 28.19 L 254.45 27.83 L 254.30 27.61 L 254.05 27.45 L 253.72 27.25 ZM 279.72 29.99 L 279.21 30.20 L 278.93 30.92 L 279.79 31.06 L 280.08 30.42 ZM 259.80 30.09 L 259.35 30.20 L 258.99 30.51 L 258.63 31.00 L 258.28 31.64 L 258.12 32.11 L 258.19 32.55 L 258.55 33.08 L 258.99 33.51 L 259.41 33.63 L 259.92 33.54 L 260.57 33.38 L 261.13 33.09 L 261.54 32.83 L 261.68 32.68 L 261.76 32.46 L 261.80 32.19 L 261.79 31.86 L 261.67 31.18 L 261.46 30.75 L 261.08 30.45 L 260.43 30.20 ZM 248.04 34.30 L 247.76 34.88 L 248.76 34.59 L 248.47 34.09 ZM 254.80 34.30 L 254.30 34.88 L 254.66 35.45 L 255.24 35.45 L 255.68 34.67 ZM 253.71 40.32 L 253.65 39.86 L 253.56 39.48 L 253.43 39.17 L 253.24 38.92 L 252.95 38.69 L 252.56 38.50 L 252.07 38.34 L 251.53 38.19 L 251.06 38.12 L 250.66 38.12 L 250.33 38.19 L 250.05 38.35 L 249.75 38.61 L 249.44 38.97 L 249.11 39.42 L 248.80 39.87 L 248.58 40.28 L 248.46 40.64 L 248.43 40.97 L 248.49 41.30 L 248.62 41.66 L 248.83 42.07 L 249.11 42.52 L 249.47 42.89 L 249.80 43.16 L 250.11 43.32 L 250.41 43.38 L 251.10 43.29 L 252.01 43.02 L 252.85 42.71 L 253.36 42.34 L 253.52 42.10 L 253.63 41.77 L 253.70 41.36 L 253.72 40.86 ZM 258.26 42.52 L 257.97 43.24 L 258.77 43.45 L 259.41 42.88 L 258.83 42.30 ZM 265.64 43.95 L 265.28 43.85 L 264.91 43.88 L 264.52 44.03 L 263.95 44.35 L 263.52 44.58 L 263.37 44.70 L 263.27 44.89 L 263.22 45.14 L 263.22 45.47 L 263.32 45.79 L 263.45 46.02 L 263.61 46.16 L 263.80 46.22 L 264.31 46.26 L 264.96 46.33 L 265.53 46.38 L 265.93 46.38 L 266.24 46.20 L 266.54 45.75 L 266.71 45.25 L 266.65 44.89 L 266.39 44.56 L 265.97 44.17 ZM 265.26 49.12 L 265.32 48.64 L 265.31 48.44 L 265.21 48.26 L 265.02 48.11 L 264.74 47.99 L 264.09 47.85 L 263.58 47.88 L 263.16 48.12 L 262.72 48.63 L 262.56 48.90 L 262.48 49.14 L 262.49 49.35 L 262.58 49.53 L 262.90 49.92 L 263.30 50.44 L 263.63 50.78 L 263.91 50.97 L 264.23 51.02 L 264.68 50.94 L 265.06 50.76 L 265.21 50.52 L 265.23 50.17 L 265.24 49.72 ZM 257.53 48.91 L 257.04 48.86 L 256.56 49.05 L 255.96 49.50 L 255.34 50.03 L 254.91 50.47 L 254.79 50.70 L 254.74 50.98 L 254.77 51.33 L 254.88 51.74 L 255.07 52.15 L 255.28 52.47 L 255.52 52.68 L 255.77 52.78 L 256.42 52.84 L 257.26 52.81 L 257.92 52.68 L 258.33 52.42 L 258.61 51.98 L 258.83 51.30 L 258.97 50.59 L 258.94 50.08 L 258.68 49.64 L 258.12 49.20 ZM 249.69 50.50 L 249.05 51.02 L 249.19 51.80 L 249.99 51.80 L 250.05 51.16 ZM 270.13 50.53 L 269.88 50.66 L 269.68 50.90 L 269.43 51.22 L 269.26 51.48 L 269.19 51.75 L 269.22 52.02 L 269.35 52.30 L 269.61 52.70 L 269.79 53.03 L 270.02 53.21 L 270.43 53.17 L 270.88 53.00 L 271.07 52.75 L 271.13 52.38 L 271.15 51.88 L 271.13 51.38 L 271.07 51.05 L 270.90 50.81 L 270.51 50.58 ZM 276.57 51.49 L 276.33 51.30 L 276.06 51.23 L 275.69 51.30 L 275.18 51.50 L 274.79 51.66 L 274.64 51.76 L 274.54 51.93 L 274.48 52.16 L 274.46 52.45 L 274.58 53.06 L 274.79 53.42 L 275.15 53.65 L 275.76 53.81 L 276.35 53.96 L 276.80 53.97 L 277.19 53.77 L 277.55 53.31 L 277.69 53.08 L 277.74 52.88 L 277.70 52.70 L 277.58 52.56 L 277.24 52.24 L 276.83 51.80 ZM 253.68 55.72 L 253.40 55.47 L 253.02 55.36 L 252.51 55.33 L 252.11 55.38 L 251.77 55.54 L 251.50 55.81 L 251.29 56.19 L 251.11 56.70 L 251.02 57.09 L 251.12 57.45 L 251.49 57.84 L 251.95 58.11 L 252.32 58.17 L 252.71 58.06 L 253.22 57.78 L 253.64 57.44 L 253.90 57.14 L 253.98 56.76 L 253.86 56.19 ZM 255.86 62.12 L 255.66 61.24 L 255.49 60.90 L 255.19 60.62 L 254.74 60.39 L 254.16 60.22 L 253.41 60.08 L 252.76 60.01 L 252.20 60.01 L 251.74 60.08 L 251.33 60.26 L 250.92 60.59 L 250.52 61.06 L 250.13 61.67 L 249.79 62.38 L 249.56 63.02 L 249.45 63.57 L 249.44 64.05 L 249.56 64.50 L 249.79 65.00 L 250.15 65.54 L 250.63 66.13 L 251.12 66.59 L 251.59 66.89 L 252.02 67.04 L 252.43 67.03 L 252.86 66.91 L 253.36 66.75 L 253.94 66.53 L 254.58 66.27 L 255.10 66.01 L 255.51 65.74 L 255.79 65.46 L 255.96 65.16 L 256.05 64.81 L 256.09 64.40 L 256.08 63.90 L 256.04 63.33 Z' style='fill-rule: nonzero; fill: #FF0000; stroke-width: 0.75; stroke: none;' /> <path d='M 307.91 71.97 L 305.66 71.78 L 303.55 71.47 L 301.59 71.03 L 299.77 70.46 L 298.10 69.77 L 296.58 68.96 L 295.20 68.02 L 293.97 66.95 L 292.88 65.76 L 291.94 64.44 L 291.15 63.00 L 290.50 61.43 L 290.00 59.73 L 289.65 57.92 L 289.44 55.97 L 289.33 52.59 L 289.44 49.20 L 289.66 47.28 L 290.04 45.48 L 290.56 43.81 L 291.24 42.25 L 292.07 40.81 L 293.06 39.49 L 294.20 38.30 L 295.49 37.22 L 296.92 36.27 L 298.47 35.44 L 300.14 34.74 L 301.93 34.17 L 303.84 33.73 L 305.88 33.41 L 308.04 33.22 L 310.32 33.16 L 312.60 33.22 L 314.75 33.41 L 316.79 33.73 L 318.71 34.17 L 320.51 34.74 L 322.18 35.44 L 323.74 36.27 L 325.18 37.22 L 326.48 38.30 L 327.62 39.49 L 328.61 40.81 L 329.44 42.25 L 330.11 43.81 L 330.63 45.48 L 330.99 47.28 L 331.19 49.20 L 331.30 52.59 L 331.19 55.97 L 330.98 57.92 L 330.63 59.73 L 330.13 61.43 L 329.48 63.00 L 328.69 64.44 L 327.75 65.76 L 326.67 66.95 L 325.43 68.02 L 324.06 68.96 L 322.53 69.77 L 320.86 70.46 L 319.04 71.03 L 317.08 71.47 L 314.97 71.78 L 312.72 71.97 L 310.32 72.03 ZM 317.48 35.62 L 317.16 35.61 L 316.75 35.72 L 316.22 35.75 L 315.84 35.77 L 315.57 35.83 L 315.39 35.97 L 315.29 36.25 L 315.09 36.77 L 314.96 37.19 L 315.00 37.53 L 315.36 37.83 L 315.83 38.08 L 316.22 38.13 L 316.62 37.96 L 317.08 37.61 L 317.51 37.22 L 317.77 36.91 L 317.86 36.52 L 317.74 35.95 ZM 304.76 36.36 L 304.38 36.25 L 303.91 36.33 L 303.27 36.39 L 302.68 36.47 L 302.21 36.55 L 302.03 36.62 L 301.88 36.76 L 301.76 36.97 L 301.68 37.25 L 301.61 37.94 L 301.71 38.41 L 302.01 38.77 L 302.55 39.13 L 303.24 39.47 L 303.75 39.64 L 303.99 39.63 L 304.25 39.54 L 304.53 39.38 L 304.85 39.13 L 305.33 38.58 L 305.49 38.08 L 305.40 37.53 L 305.13 36.83 ZM 308.74 38.05 L 308.16 38.55 L 308.38 39.27 L 308.96 38.99 L 309.32 38.49 ZM 316.34 39.74 L 316.07 39.56 L 315.71 39.52 L 315.22 39.56 L 314.88 39.68 L 314.74 39.89 L 314.70 40.18 L 314.64 40.56 L 314.56 41.03 L 314.46 41.41 L 314.48 41.70 L 314.79 41.94 L 315.31 42.15 L 315.72 42.22 L 316.10 42.11 L 316.50 41.80 L 316.82 41.39 L 316.89 41.05 L 316.80 40.66 L 316.58 40.14 ZM 306.90 40.11 L 306.64 39.89 L 306.39 39.81 L 306.07 39.92 L 305.77 40.10 L 305.74 40.36 L 305.93 41.08 L 305.99 41.34 L 306.09 41.53 L 306.24 41.66 L 306.43 41.72 L 307.11 41.88 L 307.36 41.83 L 307.58 41.58 L 307.71 41.20 L 307.64 40.94 L 307.47 40.70 L 307.22 40.42 ZM 324.01 42.80 L 323.41 42.14 L 322.84 41.63 L 322.30 41.25 L 321.72 41.03 L 321.02 40.97 L 320.20 41.08 L 319.25 41.36 L 318.39 41.75 L 317.77 42.20 L 317.38 42.70 L 317.22 43.27 L 317.20 43.92 L 317.20 44.68 L 317.21 45.56 L 317.24 46.55 L 317.31 47.24 L 317.46 47.78 L 317.70 48.19 L 318.02 48.45 L 318.43 48.65 L 318.94 48.84 L 319.55 49.02 L 320.25 49.20 L 320.91 49.40 L 321.49 49.54 L 322.01 49.64 L 322.46 49.69 L 322.87 49.63 L 323.27 49.40 L 323.67 49.02 L 324.07 48.49 L 324.56 47.74 L 324.96 47.07 L 325.26 46.48 L 325.47 45.97 L 325.54 45.47 L 325.43 44.91 L 325.13 44.28 L 324.64 43.59 ZM 297.20 46.48 L 296.86 45.92 L 296.54 45.43 L 296.27 45.03 L 295.96 44.74 L 295.52 44.56 L 294.96 44.52 L 294.27 44.60 L 293.64 44.81 L 293.19 45.10 L 292.92 45.46 L 292.85 45.89 L 292.85 46.97 L 292.75 48.34 L 292.63 49.58 L 292.54 50.55 L 292.58 50.95 L 292.80 51.31 L 293.19 51.64 L 293.75 51.94 L 294.50 52.22 L 295.16 52.42 L 295.73 52.52 L 296.21 52.53 L 296.66 52.42 L 297.14 52.15 L 297.66 51.73 L 298.22 51.16 L 298.61 50.60 L 298.82 50.09 L 298.87 49.63 L 298.75 49.22 L 298.53 48.79 L 298.27 48.30 L 297.95 47.74 L 297.58 47.13 ZM 311.26 60.44 L 312.01 60.22 L 312.56 59.85 L 312.91 59.33 L 313.14 58.65 L 313.32 57.81 L 313.47 56.79 L 313.57 55.61 L 313.59 55.02 L 313.61 54.32 L 313.62 53.51 L 313.63 52.59 L 313.62 51.68 L 313.61 50.87 L 313.59 50.16 L 313.57 49.56 L 313.47 48.45 L 313.32 47.47 L 313.14 46.63 L 312.91 45.94 L 312.56 45.39 L 312.01 44.99 L 311.26 44.75 L 310.32 44.67 L 309.40 44.75 L 308.67 44.99 L 308.12 45.39 L 307.75 45.94 L 307.51 46.63 L 307.32 47.47 L 307.18 48.45 L 307.08 49.56 L 307.05 50.16 L 307.02 50.87 L 307.01 51.68 L 307.00 52.59 L 307.01 53.51 L 307.02 54.32 L 307.05 55.02 L 307.08 55.61 L 307.18 56.79 L 307.32 57.81 L 307.51 58.65 L 307.75 59.33 L 308.12 59.85 L 308.67 60.22 L 309.40 60.44 L 310.32 60.52 ZM 305.45 45.16 L 305.16 45.03 L 304.80 45.01 L 304.35 44.95 L 303.99 44.96 L 303.70 45.03 L 303.48 45.18 L 303.33 45.39 L 303.25 45.71 L 303.30 45.94 L 303.47 46.14 L 303.77 46.41 L 304.02 46.65 L 304.22 46.83 L 304.47 46.88 L 304.85 46.77 L 305.23 46.49 L 305.49 46.27 L 305.64 45.97 L 305.64 45.53 ZM 300.46 44.95 L 300.16 45.69 L 300.74 45.97 L 301.10 45.75 L 301.25 45.17 ZM 301.49 51.42 L 301.27 51.54 L 301.09 51.74 L 300.96 52.02 L 300.87 52.35 L 300.87 52.65 L 300.97 52.89 L 301.18 53.09 L 301.46 53.26 L 301.75 53.31 L 302.04 53.26 L 302.33 53.09 L 302.60 52.87 L 302.76 52.61 L 302.81 52.33 L 302.75 52.02 L 302.64 51.67 L 302.43 51.49 L 302.13 51.40 L 301.75 51.38 ZM 328.36 54.89 L 328.00 54.45 L 327.59 54.22 L 327.13 54.19 L 326.01 54.33 L 324.58 54.47 L 323.98 54.55 L 323.47 54.67 L 323.06 54.80 L 322.74 54.97 L 322.48 55.20 L 322.26 55.54 L 322.07 55.99 L 321.91 56.55 L 321.80 57.16 L 321.75 57.68 L 321.76 58.13 L 321.83 58.50 L 322.00 58.83 L 322.28 59.16 L 322.69 59.51 L 323.21 59.86 L 323.89 60.27 L 324.49 60.61 L 325.02 60.90 L 325.47 61.13 L 325.92 61.23 L 326.42 61.15 L 326.98 60.89 L 327.60 60.44 L 328.21 59.81 L 328.68 59.23 L 329.00 58.70 L 329.18 58.20 L 329.23 57.69 L 329.16 57.07 L 328.98 56.36 L 328.68 55.55 ZM 300.74 55.11 L 300.52 55.77 L 301.10 56.27 L 301.68 55.83 L 301.39 55.25 ZM 294.09 56.95 L 293.72 56.92 L 293.35 57.08 L 292.89 57.42 L 292.60 57.73 L 292.43 58.07 L 292.38 58.45 L 292.46 58.86 L 292.68 59.42 L 292.89 59.80 L 293.22 60.01 L 293.75 60.08 L 294.01 60.05 L 294.20 59.98 L 294.31 59.86 L 294.36 59.69 L 294.43 59.24 L 294.55 58.72 L 294.74 58.20 L 294.88 57.81 L 294.85 57.50 L 294.55 57.20 ZM 296.79 57.92 L 296.13 58.50 L 296.71 59.08 L 297.29 59.00 L 297.50 58.36 ZM 303.63 58.36 L 302.83 58.92 L 303.33 59.58 L 304.13 59.80 L 304.41 58.86 ZM 310.44 64.59 L 310.37 64.20 L 310.27 63.89 L 310.13 63.66 L 309.66 63.28 L 308.88 62.95 L 308.40 62.80 L 307.99 62.71 L 307.65 62.68 L 307.36 62.70 L 307.10 62.81 L 306.83 63.01 L 306.53 63.30 L 306.22 63.69 L 306.02 63.99 L 305.93 64.27 L 305.96 64.52 L 306.10 64.74 L 306.55 65.21 L 307.08 65.84 L 307.42 66.26 L 307.72 66.49 L 308.09 66.54 L 308.60 66.42 L 309.37 66.17 L 309.96 66.02 L 310.18 65.92 L 310.33 65.72 L 310.43 65.43 L 310.46 65.05 ZM 296.17 63.35 L 295.85 63.13 L 295.45 63.02 L 294.99 63.03 L 294.58 63.21 L 294.39 63.47 L 294.33 63.83 L 294.27 64.33 L 294.21 64.78 L 294.19 65.13 L 294.32 65.39 L 294.69 65.63 L 295.12 65.74 L 295.41 65.63 L 295.66 65.37 L 295.99 65.05 L 296.30 64.67 L 296.52 64.41 L 296.59 64.12 L 296.43 63.69 ZM 313.40 63.64 L 312.99 63.44 L 312.61 63.41 L 312.19 63.69 L 311.93 64.09 L 312.00 64.44 L 312.25 64.83 L 312.47 65.34 L 312.68 65.72 L 312.86 65.99 L 313.11 66.13 L 313.49 66.13 L 314.12 66.06 L 314.60 65.99 L 314.78 65.91 L 314.94 65.75 L 315.06 65.51 L 315.14 65.19 L 315.16 64.90 L 315.11 64.67 L 315.01 64.49 L 314.85 64.38 L 314.44 64.16 L 313.93 63.89 ZM 321.00 64.00 L 320.57 63.94 L 320.09 64.09 L 319.46 64.33 L 318.89 64.67 L 318.46 64.99 L 318.30 65.15 L 318.20 65.38 L 318.16 65.65 L 318.16 65.99 L 318.36 66.94 L 318.52 67.64 L 318.66 67.91 L 318.91 68.13 L 319.27 68.31 L 319.75 68.44 L 320.20 68.46 L 320.56 68.39 L 320.81 68.23 L 320.97 67.97 L 321.25 67.28 L 321.63 66.42 L 321.91 65.83 L 322.02 65.34 L 321.90 64.89 L 321.47 64.41 ZM 300.57 64.49 L 300.27 64.52 L 299.99 64.66 L 299.66 64.91 L 299.36 65.23 L 299.16 65.49 L 299.09 65.77 L 299.16 66.20 L 299.39 66.55 L 299.63 66.74 L 299.94 66.80 L 300.38 66.78 L 300.74 66.68 L 300.96 66.53 L 301.10 66.28 L 301.25 65.91 L 301.31 65.47 L 301.35 65.16 L 301.27 64.90 L 300.96 64.61 ZM 314.43 67.72 L 313.99 68.66 L 314.79 69.02 L 315.43 68.80 L 315.43 67.86 Z' style='fill-rule: nonzero; fill: #FF0000; stroke-width: 0.75; stroke: none;' /> <path d='M 337.88 71.17 L 337.24 70.74 L 336.81 70.11 L 336.67 69.38 L 336.67 35.81 L 336.81 35.08 L 337.24 34.45 L 337.88 34.02 L 338.62 33.88 L 351.21 33.88 L 351.95 34.02 L 352.59 34.45 L 353.02 35.08 L 353.17 35.81 L 353.17 38.19 L 354.15 37.19 L 355.30 36.26 L 356.63 35.41 L 358.12 34.64 L 359.75 33.99 L 361.45 33.53 L 363.25 33.25 L 365.12 33.16 L 366.92 33.27 L 368.66 33.59 L 370.34 34.13 L 371.95 34.89 L 373.46 35.87 L 374.82 37.08 L 376.03 38.53 L 377.09 40.22 L 377.55 41.15 L 377.96 42.15 L 378.30 43.21 L 378.58 44.34 L 378.79 45.53 L 378.95 46.79 L 379.04 48.11 L 379.07 49.50 L 379.07 69.38 L 378.93 70.11 L 378.49 70.74 L 378.20 70.99 L 377.87 71.17 L 377.52 71.28 L 377.13 71.31 L 363.38 71.31 L 362.64 71.17 L 362.01 70.74 L 361.59 70.11 L 361.45 69.38 L 361.45 50.00 L 361.39 49.07 L 361.22 48.27 L 360.94 47.59 L 360.54 47.04 L 360.04 46.60 L 359.42 46.30 L 358.68 46.11 L 357.84 46.05 L 357.00 46.11 L 356.27 46.30 L 355.65 46.60 L 355.14 47.04 L 354.75 47.59 L 354.47 48.27 L 354.30 49.07 L 354.24 50.00 L 354.24 69.38 L 354.10 70.11 L 353.67 70.74 L 353.03 71.17 L 352.29 71.31 L 338.62 71.31 ZM 341.94 35.14 L 341.53 35.20 L 341.22 35.43 L 340.92 35.89 L 340.73 36.41 L 340.74 36.80 L 340.95 37.14 L 341.35 37.55 L 341.76 37.96 L 342.10 38.20 L 342.49 38.24 L 343.01 38.05 L 343.53 37.70 L 343.79 37.38 L 343.89 36.97 L 343.87 36.39 L 343.69 35.81 L 343.43 35.50 L 343.06 35.33 L 342.51 35.17 ZM 345.74 36.69 L 345.17 37.05 L 345.09 37.97 L 346.10 37.97 L 346.46 37.05 ZM 348.56 40.72 L 347.90 41.08 L 348.20 41.72 L 348.98 41.94 L 349.13 41.14 ZM 351.65 41.14 L 351.15 41.94 L 351.51 42.58 L 352.37 42.58 L 352.45 41.66 ZM 343.02 41.43 L 342.82 41.33 L 342.58 41.31 L 342.21 41.36 L 341.93 41.45 L 341.71 41.58 L 341.57 41.74 L 341.49 41.94 L 341.50 42.21 L 341.58 42.44 L 341.75 42.64 L 341.99 42.80 L 342.43 42.85 L 342.87 42.58 L 343.11 42.33 L 343.26 42.16 L 343.31 41.96 L 343.23 41.66 ZM 360.98 41.34 L 360.60 41.28 L 360.29 41.39 L 359.99 41.72 L 359.70 42.24 L 359.56 42.63 L 359.62 43.01 L 359.93 43.52 L 360.28 43.82 L 360.60 43.84 L 360.99 43.71 L 361.51 43.52 L 361.86 43.34 L 362.11 43.11 L 362.26 42.80 L 362.31 42.44 L 362.25 42.11 L 362.09 41.84 L 361.82 41.64 L 361.45 41.50 ZM 355.17 41.70 L 354.74 41.63 L 354.36 41.74 L 353.95 42.16 L 353.67 42.61 L 353.67 42.99 L 353.86 43.36 L 354.17 43.88 L 354.46 44.29 L 354.74 44.50 L 355.10 44.54 L 355.60 44.45 L 356.10 44.29 L 356.43 44.09 L 356.62 43.77 L 356.68 43.24 L 356.63 42.68 L 356.49 42.33 L 356.22 42.09 L 355.74 41.86 ZM 370.36 43.09 L 369.85 42.74 L 369.62 42.63 L 369.34 42.62 L 369.01 42.70 L 368.63 42.88 L 368.26 43.13 L 368.01 43.38 L 367.86 43.64 L 367.84 43.89 L 367.95 44.50 L 368.13 45.33 L 368.28 46.17 L 368.42 46.83 L 368.54 47.09 L 368.77 47.30 L 369.09 47.45 L 369.51 47.55 L 369.97 47.55 L 370.34 47.49 L 370.63 47.36 L 370.84 47.16 L 371.19 46.57 L 371.59 45.75 L 371.71 45.38 L 371.77 45.06 L 371.78 44.78 L 371.73 44.56 L 371.48 44.13 L 371.01 43.59 ZM 350.02 43.72 L 349.74 43.28 L 349.59 43.12 L 349.38 43.03 L 349.11 43.02 L 348.77 43.09 L 348.45 43.18 L 348.20 43.31 L 348.04 43.47 L 347.96 43.67 L 347.90 44.19 L 347.84 44.89 L 347.87 45.36 L 348.03 45.75 L 348.34 46.08 L 348.77 46.33 L 349.20 46.38 L 349.48 46.27 L 349.73 45.99 L 350.06 45.61 L 350.31 45.28 L 350.49 45.00 L 350.53 44.68 L 350.35 44.24 ZM 342.17 45.56 L 341.82 45.63 L 341.52 45.81 L 341.28 46.11 L 340.97 46.63 L 340.77 47.02 L 340.76 47.39 L 340.99 47.84 L 341.40 48.18 L 341.77 48.20 L 342.20 48.04 L 342.71 47.84 L 343.06 47.63 L 343.21 47.41 L 343.27 47.12 L 343.29 46.69 L 343.26 46.26 L 343.15 45.97 L 342.94 45.77 L 342.57 45.61 ZM 376.27 48.63 L 375.62 48.92 L 375.98 49.56 L 376.70 49.56 L 376.70 48.84 ZM 338.90 49.20 L 338.40 49.78 L 338.68 50.28 L 339.34 50.44 L 339.48 49.72 ZM 350.89 50.58 L 350.71 50.31 L 350.46 50.11 L 350.13 50.00 L 349.76 49.98 L 349.45 50.08 L 349.19 50.27 L 348.98 50.58 L 348.85 50.93 L 348.84 51.24 L 348.93 51.53 L 349.13 51.80 L 349.45 52.06 L 349.70 52.13 L 349.97 52.06 L 350.35 51.94 L 350.67 51.74 L 350.88 51.56 L 351.01 51.31 L 350.99 50.94 ZM 346.69 52.57 L 346.36 52.20 L 346.04 51.92 L 345.74 51.74 L 345.41 51.63 L 345.01 51.60 L 344.52 51.66 L 343.95 51.80 L 343.44 51.97 L 343.06 52.18 L 342.82 52.45 L 342.71 52.75 L 342.65 53.52 L 342.57 54.53 L 342.54 55.56 L 342.57 56.34 L 342.67 56.65 L 342.90 56.93 L 343.25 57.19 L 343.73 57.42 L 344.19 57.54 L 344.58 57.55 L 344.89 57.43 L 345.12 57.20 L 345.59 56.56 L 346.18 55.77 L 346.82 54.99 L 347.29 54.42 L 347.42 54.16 L 347.42 53.84 L 347.30 53.47 L 347.04 53.03 ZM 370.39 52.64 L 369.82 52.27 L 369.31 52.00 L 368.85 51.81 L 368.40 51.75 L 367.88 51.84 L 367.32 52.10 L 366.70 52.52 L 365.96 53.08 L 365.34 53.60 L 364.83 54.09 L 364.43 54.55 L 364.17 55.04 L 364.06 55.66 L 364.11 56.40 L 364.32 57.27 L 364.66 58.15 L 365.05 58.84 L 365.48 59.34 L 365.96 59.66 L 366.54 59.84 L 367.25 59.97 L 368.09 60.05 L 369.07 60.08 L 369.86 60.03 L 370.50 59.88 L 370.98 59.64 L 371.31 59.30 L 371.56 58.85 L 371.83 58.29 L 372.10 57.62 L 372.38 56.84 L 372.56 56.17 L 372.64 55.59 L 372.63 55.09 L 372.53 54.69 L 372.32 54.31 L 372.01 53.92 L 371.57 53.52 L 371.01 53.09 ZM 350.42 57.83 L 350.13 58.14 L 349.73 58.72 L 349.67 58.95 L 349.85 59.22 L 350.15 59.45 L 350.47 59.54 L 350.83 59.52 L 351.21 59.36 L 351.48 59.19 L 351.57 58.97 L 351.56 58.67 L 351.51 58.28 L 351.31 57.98 L 350.85 57.78 ZM 376.77 60.02 L 376.42 60.74 L 376.63 61.38 L 377.28 61.02 L 377.71 60.22 ZM 350.13 61.09 L 350.13 61.95 L 350.63 62.39 L 351.21 61.95 L 350.99 61.30 ZM 342.29 61.59 L 341.93 62.24 L 342.35 62.67 L 343.15 62.67 L 342.93 61.95 ZM 369.51 62.81 L 368.93 63.39 L 369.29 63.89 L 369.71 63.83 L 369.93 63.39 ZM 343.63 64.54 L 343.26 64.33 L 342.87 64.29 L 342.35 64.47 L 341.93 64.83 L 341.79 65.19 L 341.83 65.62 L 341.93 66.20 L 342.13 66.77 L 342.29 67.17 L 342.56 67.43 L 343.07 67.56 L 343.70 67.57 L 344.12 67.42 L 344.44 67.10 L 344.73 66.56 L 344.90 66.05 L 344.81 65.67 L 344.52 65.32 L 344.09 64.91 ZM 351.07 65.44 L 350.69 65.32 L 350.31 65.34 L 349.92 65.49 L 349.33 65.86 L 348.87 66.13 L 348.71 66.27 L 348.63 66.47 L 348.63 66.74 L 348.70 67.06 L 348.87 67.81 L 349.12 68.33 L 349.31 68.51 L 349.58 68.64 L 349.92 68.74 L 350.35 68.80 L 350.71 68.80 L 351.00 68.74 L 351.21 68.62 L 351.35 68.44 L 351.59 67.91 L 351.87 67.20 L 352.04 66.70 L 352.01 66.34 L 351.79 66.04 L 351.43 65.70 ZM 367.25 66.25 L 366.99 66.26 L 366.75 66.36 L 366.56 66.56 L 366.43 66.79 L 366.39 67.03 L 366.46 67.29 L 366.62 67.56 L 366.84 67.84 L 367.07 68.00 L 367.31 68.06 L 367.56 68.00 L 367.77 67.89 L 367.93 67.72 L 368.03 67.47 L 368.06 67.14 L 368.03 66.85 L 367.93 66.61 L 367.77 66.45 L 367.56 66.34 ZM 376.51 66.56 L 376.13 66.56 L 375.81 66.76 L 375.56 66.92 L 375.41 67.14 L 375.40 67.50 L 375.52 67.79 L 375.73 67.94 L 376.34 68.08 L 376.67 68.08 L 376.95 68.02 L 377.18 67.90 L 377.35 67.72 L 377.42 67.42 L 377.35 67.22 L 377.17 67.05 L 376.92 66.84 ZM 339.62 67.36 L 338.98 67.92 L 339.40 68.86 L 340.42 68.44 L 340.56 67.42 Z' style='fill-rule: nonzero; fill: #FF0000; stroke-width: 0.75; stroke: none;' /> <path d='M 405.57 71.23 L 403.19 70.96 L 400.94 70.52 L 398.84 69.91 L 397.85 69.53 L 396.92 69.08 L 396.04 68.58 L 395.21 68.01 L 394.44 67.38 L 393.72 66.69 L 393.05 65.94 L 392.44 65.13 L 391.89 64.24 L 391.41 63.27 L 391.01 62.23 L 390.68 61.11 L 390.42 59.91 L 390.24 58.63 L 390.13 57.27 L 390.09 55.83 L 390.09 46.47 L 384.41 46.47 L 383.66 46.33 L 383.03 45.91 L 382.61 45.27 L 382.47 44.53 L 382.47 35.81 L 382.61 35.08 L 383.03 34.45 L 383.66 34.02 L 384.41 33.88 L 390.09 33.88 L 390.09 22.14 L 390.24 21.40 L 390.67 20.77 L 391.30 20.33 L 392.05 20.19 L 404.64 20.19 L 405.02 20.22 L 405.38 20.33 L 405.70 20.51 L 406.00 20.77 L 406.43 21.40 L 406.58 22.14 L 406.58 33.88 L 415.66 33.88 L 416.04 33.91 L 416.39 34.02 L 416.72 34.20 L 417.02 34.45 L 417.45 35.08 L 417.59 35.81 L 417.59 44.53 L 417.45 45.27 L 417.02 45.91 L 416.39 46.33 L 415.66 46.47 L 406.58 46.47 L 406.58 54.39 L 406.62 55.18 L 406.76 55.88 L 407.00 56.50 L 407.33 57.03 L 407.76 57.46 L 408.30 57.76 L 408.94 57.94 L 409.69 58.00 L 416.23 58.00 L 416.62 58.04 L 416.97 58.15 L 417.30 58.33 L 417.59 58.58 L 417.85 58.88 L 418.03 59.20 L 418.14 59.55 L 418.17 59.94 L 418.17 69.38 L 418.03 70.11 L 417.59 70.74 L 417.30 70.99 L 416.97 71.17 L 416.62 71.28 L 416.23 71.31 L 408.09 71.31 ZM 401.52 22.10 L 401.25 21.99 L 401.00 21.96 L 400.78 22.00 L 400.36 22.21 L 399.89 22.56 L 399.41 23.04 L 399.12 23.44 L 399.07 23.64 L 399.07 23.89 L 399.12 24.18 L 399.23 24.52 L 399.55 25.09 L 399.89 25.38 L 400.36 25.47 L 401.05 25.45 L 401.65 25.33 L 402.01 25.13 L 402.24 24.76 L 402.41 24.16 L 402.55 23.52 L 402.55 23.05 L 402.34 22.65 L 401.83 22.28 ZM 396.44 24.30 L 395.56 24.44 L 395.86 25.24 L 396.58 25.74 L 397.16 24.94 ZM 401.47 26.67 L 401.05 27.03 L 400.83 27.69 L 401.55 27.75 L 401.69 27.25 ZM 395.68 28.29 L 395.51 28.09 L 395.28 27.96 L 394.98 27.89 L 394.66 27.85 L 394.41 27.86 L 394.20 27.97 L 393.98 28.25 L 393.89 28.57 L 393.90 28.87 L 394.00 29.15 L 394.20 29.41 L 394.47 29.60 L 394.70 29.59 L 395.28 29.34 L 395.69 28.96 L 395.78 28.55 ZM 397.41 32.60 L 396.67 32.24 L 396.02 31.93 L 395.45 31.67 L 394.92 31.56 L 394.34 31.68 L 393.71 32.05 L 393.05 32.66 L 392.48 33.32 L 392.14 33.94 L 392.03 34.52 L 392.14 35.06 L 392.40 35.63 L 392.73 36.28 L 393.11 37.01 L 393.55 37.83 L 393.97 38.60 L 394.36 39.24 L 394.73 39.77 L 395.06 40.17 L 395.46 40.46 L 396.00 40.61 L 396.69 40.65 L 397.51 40.56 L 398.33 40.39 L 398.99 40.14 L 399.48 39.82 L 399.81 39.42 L 400.04 38.93 L 400.24 38.32 L 400.40 37.59 L 400.53 36.75 L 400.63 35.98 L 400.62 35.34 L 400.50 34.82 L 400.28 34.42 L 399.95 34.08 L 399.50 33.73 L 398.93 33.37 L 398.23 33.02 ZM 388.68 37.29 L 388.47 37.08 L 388.16 36.98 L 387.72 36.89 L 387.28 36.91 L 386.97 36.97 L 386.73 37.15 L 386.50 37.55 L 386.34 38.01 L 386.28 38.38 L 386.39 38.71 L 386.72 39.06 L 387.17 39.40 L 387.53 39.56 L 387.92 39.52 L 388.44 39.27 L 388.82 38.92 L 388.97 38.59 L 388.95 38.20 L 388.80 37.69 ZM 406.50 41.22 L 406.31 40.93 L 406.03 40.71 L 405.64 40.56 L 405.06 40.42 L 404.59 40.25 L 404.40 40.21 L 404.21 40.27 L 404.02 40.44 L 403.84 40.72 L 403.54 41.31 L 403.48 41.80 L 403.67 42.26 L 404.06 42.80 L 404.53 43.23 L 404.95 43.38 L 405.44 43.31 L 406.08 43.09 L 406.46 42.84 L 406.61 42.52 L 406.62 42.10 L 406.58 41.58 ZM 390.59 40.33 L 390.39 40.36 L 390.17 40.56 L 390.01 40.88 L 389.95 41.18 L 390.01 41.46 L 390.17 41.72 L 390.44 41.93 L 390.67 41.99 L 390.93 41.92 L 391.25 41.80 L 391.44 41.70 L 391.58 41.53 L 391.66 41.30 L 391.69 41.00 L 391.65 40.80 L 391.54 40.64 L 391.36 40.51 L 391.11 40.42 ZM 410.33 41.44 L 410.55 42.16 L 411.26 42.30 L 411.41 41.66 L 411.05 41.28 ZM 387.44 43.02 L 387.22 43.81 L 387.58 44.24 L 388.30 44.09 L 388.16 43.38 ZM 400.39 43.67 L 399.89 44.24 L 400.31 44.75 L 400.97 44.67 L 401.25 43.95 ZM 396.96 45.06 L 396.72 44.81 L 396.38 44.69 L 395.92 44.60 L 395.27 44.51 L 394.77 44.39 L 394.56 44.36 L 394.36 44.44 L 394.17 44.61 L 393.98 44.89 L 393.84 45.19 L 393.78 45.46 L 393.79 45.69 L 393.87 45.89 L 394.19 46.31 L 394.62 46.83 L 395.03 47.13 L 395.39 47.16 L 395.76 47.01 L 396.22 46.77 L 396.65 46.55 L 396.94 46.33 L 397.08 46.02 L 397.08 45.53 ZM 394.61 49.09 L 394.41 48.78 L 394.11 48.62 L 393.62 48.56 L 393.16 48.62 L 392.79 48.80 L 392.51 49.09 L 392.33 49.50 L 392.20 50.01 L 392.25 50.38 L 392.47 50.68 L 392.83 51.02 L 393.23 51.20 L 393.62 51.27 L 394.02 51.20 L 394.42 51.02 L 394.68 50.71 L 394.83 50.36 L 394.86 49.98 L 394.78 49.56 ZM 401.22 49.74 L 400.89 49.75 L 400.57 49.90 L 400.17 50.14 L 399.83 50.43 L 399.64 50.69 L 399.59 50.99 L 399.67 51.38 L 399.87 51.95 L 400.03 52.38 L 400.13 52.53 L 400.30 52.65 L 400.53 52.71 L 400.83 52.74 L 401.15 52.72 L 401.41 52.66 L 401.60 52.56 L 401.72 52.42 L 401.91 52.01 L 402.12 51.44 L 402.24 50.93 L 402.19 50.58 L 401.97 50.27 L 401.61 49.92 ZM 401.93 55.10 L 401.25 54.96 L 400.68 55.00 L 400.22 55.22 L 399.78 55.58 L 399.29 56.04 L 398.75 56.60 L 398.16 57.27 L 397.63 57.89 L 397.21 58.46 L 396.88 58.97 L 396.64 59.44 L 396.54 59.90 L 396.59 60.44 L 396.80 61.05 L 397.16 61.74 L 397.65 62.56 L 398.12 63.26 L 398.58 63.81 L 399.02 64.22 L 399.52 64.49 L 400.16 64.63 L 400.93 64.62 L 401.83 64.47 L 402.74 64.25 L 403.45 63.95 L 403.96 63.57 L 404.28 63.11 L 404.48 62.54 L 404.66 61.84 L 404.81 61.02 L 404.92 60.08 L 405.00 59.17 L 405.00 58.38 L 404.94 57.72 L 404.81 57.17 L 404.56 56.69 L 404.13 56.24 L 403.51 55.81 L 402.70 55.41 ZM 392.76 59.08 L 392.41 59.72 L 393.41 59.58 L 393.41 59.00 ZM 413.72 60.66 L 413.20 61.09 L 413.28 61.81 L 414.08 61.74 L 414.22 61.09 ZM 411.32 62.23 L 411.00 61.84 L 410.81 61.74 L 410.57 61.68 L 410.26 61.65 L 409.89 61.67 L 409.53 61.72 L 409.25 61.81 L 409.03 61.93 L 408.89 62.09 L 408.69 62.54 L 408.53 63.17 L 408.33 63.97 L 408.17 64.59 L 408.16 64.85 L 408.26 65.10 L 408.48 65.33 L 408.81 65.55 L 409.19 65.76 L 409.53 65.88 L 409.84 65.91 L 410.11 65.84 L 410.66 65.52 L 411.33 64.97 L 411.76 64.49 L 411.91 64.05 L 411.84 63.54 L 411.62 62.89 ZM 405.60 66.22 L 405.42 66.06 L 405.14 65.97 L 404.72 65.84 L 404.45 65.82 L 404.21 65.88 L 404.01 66.03 L 403.84 66.27 L 403.68 66.58 L 403.63 66.87 L 403.68 67.13 L 403.84 67.36 L 404.06 67.54 L 404.28 67.53 L 404.86 67.28 L 405.50 67.08 L 405.70 66.91 L 405.72 66.56 ZM 411.55 68.66 L 411.76 69.44 L 412.42 69.38 L 412.92 68.86 L 412.27 68.44 Z' style='fill-rule: nonzero; fill: #FF0000; stroke-width: 0.75; stroke: none;' /> <image width='71.31' height='71.31' x='419.98' y='7.13' preserveAspectRatio='none' xlink:href=''/> <line x1='0.00' y1='78.51' x2='227.88' y2='78.51' style='stroke-width: 4.05; stroke-linecap: square;' /> </g> </g> </svg> <p>If you inspect the SVG above you&rsquo;ll see that rather than being made up of text elements it is a collection of path and image elements.</p> <p>Again, it is unlikely that many people will use marquee like this. It is much more likely that they will encounter it through ggplot2 in the form of <code>geom_marquee()</code> and <code>element_marquee()</code>. The takeaway, however, is the same - it is now safe to use marquee even when you don&rsquo;t know which graphics device will be used to render the text with.</p> <h2 id="whats-next">What&rsquo;s Next? <a href="#whats-next"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Circling back to the starting quote. I&rsquo;m 100% certain I&rsquo;m not done yet. I believe the next big push will be proper support for vertical text in textshaping (it currently only deals with horizontal text). I also have some plans to get marquee to automatically translate the numbers in ordered lists into their proper representation in the script that is being used, so that e.g. &lsquo;3.&rsquo; will be shown as &lsquo;.٣&rsquo; when used with Arabic text.</p> orbital 0.3.0 https://www.tidyverse.org/blog/2025/01/orbital-0-3-0/ Mon, 13 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/orbital-0-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://orbital.tidymodels.org/" target="_blank" rel="noopener">orbital</a> 0.3.0. orbital lets you predict in databases using tidymodels workflows.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"orbital"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will cover the highlights, which are classification support and the new augment method.</p> <p>You can see a full list of changes in the <a href="https://orbital.tidymodels.org/news/index.html#orbital-030" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="classification-support">Classification support <a href="#classification-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The biggest improvement in this version is that <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a> now works for supported classification models. See <a href="https://orbital.tidymodels.org/articles/supported-models.html#supported-models" target="_blank" rel="noopener">vignette</a> for list of all supported models.</p> <p>Let&rsquo;s start by fitting a classification model on the <code>penguins</code> data set, using {xgboost} as the engine.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_unknown</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_impute_mean</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>lr_spec</span> <span class='o'>&lt;-</span> <span class='nf'>boost_tree</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>set_mode</span><span class='o'>(</span><span class='s'>"classification"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>set_engine</span><span class='o'>(</span><span class='s'>"xgboost"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>wf_spec</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>lr_spec</span><span class='o'>)</span></span> <span><span class='nv'>wf_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>wf_spec</span>, data <span class='o'>=</span> <span class='nv'>penguins</span><span class='o'>)</span></span></code></pre> </div> <p>With this fitted workflow object, we can call <a href="https://orbital.tidymodels.org/reference/orbital.html" target="_blank" rel="noopener"><code>orbital()</code></a> on it to create an orbital object.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>orbital_obj</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://orbital.tidymodels.org/reference/orbital.html'>orbital</a></span><span class='o'>(</span><span class='nv'>wf_fit</span><span class='o'>)</span></span> <span><span class='nv'>orbital_obj</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>orbital Object</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────</span></span></span> <span><span class='c'>#&gt; • island = dplyr::if_else(is.na(island), "unknown", island)</span></span> <span><span class='c'>#&gt; • sex = dplyr::if_else(is.na(sex), "unknown", sex)</span></span> <span><span class='c'>#&gt; • island_Dream = as.numeric(island == "Dream")</span></span> <span><span class='c'>#&gt; • island_Torgersen = as.numeric(island == "Torgersen")</span></span> <span><span class='c'>#&gt; • sex_male = as.numeric(sex == "male")</span></span> <span><span class='c'>#&gt; • sex_unknown = as.numeric(sex == "unknown")</span></span> <span><span class='c'>#&gt; • bill_length_mm = dplyr::if_else(is.na(bill_length_mm), 43.92193, bill_l ...</span></span> <span><span class='c'>#&gt; • bill_depth_mm = dplyr::if_else(is.na(bill_depth_mm), 17.15117, bill_dep ...</span></span> <span><span class='c'>#&gt; • flipper_length_mm = dplyr::if_else(is.na(flipper_length_mm), 201, flipp ...</span></span> <span><span class='c'>#&gt; • body_mass_g = dplyr::if_else(is.na(body_mass_g), 4202, body_mass_g)</span></span> <span><span class='c'>#&gt; • island_Dream = dplyr::if_else(is.na(island_Dream), 0.3604651, island_Dr ...</span></span> <span><span class='c'>#&gt; • island_Torgersen = dplyr::if_else(is.na(island_Torgersen), 0.1511628, i ...</span></span> <span><span class='c'>#&gt; • sex_male = dplyr::if_else(is.na(sex_male), 0.4883721, sex_male)</span></span> <span><span class='c'>#&gt; • sex_unknown = dplyr::if_else(is.na(sex_unknown), 0.03197674, sex_unknow ...</span></span> <span><span class='c'>#&gt; • Adelie = 0 + dplyr::case_when((bill_depth_mm &lt; 15.1 | is.na(bill_depth_ ...</span></span> <span><span class='c'>#&gt; • Chinstrap = 0 + dplyr::case_when((island_Dream &lt; 0.5 | is.na(island_Dre ...</span></span> <span><span class='c'>#&gt; • Gentoo = 0 + dplyr::case_when((bill_depth_mm &lt; 15.95 | is.na(bill_depth ...</span></span> <span><span class='c'>#&gt; • .pred_class = dplyr::case_when(Adelie &gt; Chinstrap &amp; Adelie &gt; Gentoo ~ " ...</span></span> <span><span class='c'>#&gt; ────────────────────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; 18 equations in total.</span></span> <span></span></code></pre> </div> <p>This object contains all the information that is needed to produce predictions. Which we can produce with <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 1</span></span></span> <span><span class='c'>#&gt; .pred_class</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span></span></code></pre> </div> <p>The main thing to note here is that the orbital package produces character vectors instead of factors. This is done as a unifying approach since many databases don&rsquo;t have factor types.</p> <p>Speaking of databases, you can <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> on an orbital object using tables from databases. Below we create an ephemeral in-memory RSQLite database.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbi.r-dbi.org'>DBI</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsqlite.r-dbi.org'>RSQLite</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>con_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbi.r-dbi.org/reference/dbConnect.html'>dbConnect</a></span><span class='o'>(</span><span class='nf'><a href='https://rsqlite.r-dbi.org/reference/SQLite.html'>SQLite</a></span><span class='o'>(</span><span class='o'>)</span>, path <span class='o'>=</span> <span class='s'>":memory:"</span><span class='o'>)</span></span> <span><span class='nv'>penguins_sqlite</span> <span class='o'>&lt;-</span> <span class='nf'>copy_to</span><span class='o'>(</span><span class='nv'>con_sqlite</span>, <span class='nv'>penguins</span>, name <span class='o'>=</span> <span class='s'>"penguins_table"</span><span class='o'>)</span></span></code></pre> </div> <p>And we can predict with it like normal. All the calculations are sent to the database for execution.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Source: SQL [?? x 1]</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span> <span><span class='c'>#&gt; .pred_class</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span> <span></span></code></pre> </div> <p>This works the same with <a href="https://orbital.tidymodels.org/articles/databases.html" target="_blank" rel="noopener">many types of databases</a>.</p> <p>Classification is different from regression in part because it comes with multiple prediction types. The above example showed the default which is hard classification. You can set the type of prediction you want with the <code>type</code> argument to <code>orbital</code>. For classification models, possible options are <code>&quot;class&quot;</code> and <code>&quot;prob&quot;</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>orbital_obj_prob</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://orbital.tidymodels.org/reference/orbital.html'>orbital</a></span><span class='o'>(</span><span class='nv'>wf_fit</span>, type <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"class"</span>, <span class='s'>"prob"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>orbital_obj_prob</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>orbital Object</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────</span></span></span> <span><span class='c'>#&gt; • island = dplyr::if_else(is.na(island), "unknown", island)</span></span> <span><span class='c'>#&gt; • sex = dplyr::if_else(is.na(sex), "unknown", sex)</span></span> <span><span class='c'>#&gt; • island_Dream = as.numeric(island == "Dream")</span></span> <span><span class='c'>#&gt; • island_Torgersen = as.numeric(island == "Torgersen")</span></span> <span><span class='c'>#&gt; • sex_male = as.numeric(sex == "male")</span></span> <span><span class='c'>#&gt; • sex_unknown = as.numeric(sex == "unknown")</span></span> <span><span class='c'>#&gt; • bill_length_mm = dplyr::if_else(is.na(bill_length_mm), 43.92193, bill_l ...</span></span> <span><span class='c'>#&gt; • bill_depth_mm = dplyr::if_else(is.na(bill_depth_mm), 17.15117, bill_dep ...</span></span> <span><span class='c'>#&gt; • flipper_length_mm = dplyr::if_else(is.na(flipper_length_mm), 201, flipp ...</span></span> <span><span class='c'>#&gt; • body_mass_g = dplyr::if_else(is.na(body_mass_g), 4202, body_mass_g)</span></span> <span><span class='c'>#&gt; • island_Dream = dplyr::if_else(is.na(island_Dream), 0.3604651, island_Dr ...</span></span> <span><span class='c'>#&gt; • island_Torgersen = dplyr::if_else(is.na(island_Torgersen), 0.1511628, i ...</span></span> <span><span class='c'>#&gt; • sex_male = dplyr::if_else(is.na(sex_male), 0.4883721, sex_male)</span></span> <span><span class='c'>#&gt; • sex_unknown = dplyr::if_else(is.na(sex_unknown), 0.03197674, sex_unknow ...</span></span> <span><span class='c'>#&gt; • Adelie = 0 + dplyr::case_when((bill_depth_mm &lt; 15.1 | is.na(bill_depth_ ...</span></span> <span><span class='c'>#&gt; • Chinstrap = 0 + dplyr::case_when((island_Dream &lt; 0.5 | is.na(island_Dre ...</span></span> <span><span class='c'>#&gt; • Gentoo = 0 + dplyr::case_when((bill_depth_mm &lt; 15.95 | is.na(bill_depth ...</span></span> <span><span class='c'>#&gt; • .pred_class = dplyr::case_when(Adelie &gt; Chinstrap &amp; Adelie &gt; Gentoo ~ " ...</span></span> <span><span class='c'>#&gt; • norm = exp(Adelie) + exp(Chinstrap) + exp(Gentoo)</span></span> <span><span class='c'>#&gt; • .pred_Adelie = exp(Adelie) / norm</span></span> <span><span class='c'>#&gt; • .pred_Chinstrap = exp(Chinstrap) / norm</span></span> <span><span class='c'>#&gt; • .pred_Gentoo = exp(Gentoo) / norm</span></span> <span><span class='c'>#&gt; ────────────────────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; 22 equations in total.</span></span> <span></span></code></pre> </div> <p>Notice how we can select both <code>&quot;class&quot;</code> and <code>&quot;prob&quot;</code>. The predictions now include both hard and soft class predictions.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj_prob</span>, <span class='nv'>penguins</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 4</span></span></span> <span><span class='c'>#&gt; .pred_class .pred_Adelie .pred_Chinstrap .pred_Gentoo</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie 0.709 0.024<span style='text-decoration: underline;'>5</span> 0.267 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie 0.979 0.005<span style='text-decoration: underline;'>49</span> 0.015<span style='text-decoration: underline;'>8</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie 0.980 0.005<span style='text-decoration: underline;'>59</span> 0.014<span style='text-decoration: underline;'>8</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span></span></code></pre> </div> <p>That works equally well in databases.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>orbital_obj_prob</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Source: SQL [?? x 4]</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span> <span><span class='c'>#&gt; .pred_class .pred_Adelie .pred_Chinstrap .pred_Gentoo</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie 0.709 0.024<span style='text-decoration: underline;'>5</span> 0.267 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie 0.989 0.005<span style='text-decoration: underline;'>54</span> 0.005<span style='text-decoration: underline;'>60</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie 0.979 0.005<span style='text-decoration: underline;'>49</span> 0.015<span style='text-decoration: underline;'>8</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie 0.980 0.005<span style='text-decoration: underline;'>59</span> 0.014<span style='text-decoration: underline;'>8</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span> <span></span></code></pre> </div> <h2 id="new-augment-method">New augment method <a href="#new-augment-method"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The users of tidymodels have found the <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a> function to be a handy tool. This function performs predictions and returns them alongside the original data set.</p> <p>This release adds <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a> support for orbital objects.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 8</span></span></span> <span><span class='c'>#&gt; .pred_class species island bill_length_mm bill_depth_mm flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie Adelie Torgersen 39.1 18.7 181</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie Adelie Torgersen 39.5 17.4 186</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie Adelie Torgersen 40.3 18 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie Adelie Torgersen 36.7 19.3 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie Adelie Torgersen 39.3 20.6 190</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie Adelie Torgersen 38.9 17.8 181</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie Adelie Torgersen 39.2 19.6 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie Adelie Torgersen 34.1 18.1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie Adelie Torgersen 42 20.2 190</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: body_mass_g &lt;int&gt;, sex &lt;fct&gt;</span></span></span> <span></span></code></pre> </div> <p>The function works for most databases, but for technical reasons doesn&rsquo;t work with all. It has been confirmed to not work work in spark databases or arrow tables.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>orbital_obj</span>, <span class='nv'>penguins_sqlite</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Source: SQL [?? x 8]</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Database: sqlite 3.47.1 []</span></span></span> <span><span class='c'>#&gt; .pred_class species island bill_length_mm bill_depth_mm flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie Adelie Torgersen 39.1 18.7 181</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie Adelie Torgersen 39.5 17.4 186</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie Adelie Torgersen 40.3 18 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie Adelie Torgersen 36.7 19.3 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie Adelie Torgersen 39.3 20.6 190</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie Adelie Torgersen 38.9 17.8 181</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie Adelie Torgersen 39.2 19.6 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie Adelie Torgersen 34.1 18.1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie Adelie Torgersen 42 20.2 190</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: body_mass_g &lt;int&gt;, sex &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thank you to all the people who have contributed to orbital since the release of v0.3.0:</p> <p> <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/joscani" target="_blank" rel="noopener">@joscani</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/npelikan" target="_blank" rel="noopener">@npelikan</a>, and <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>.</p> Joining the ggplot2 team https://www.tidyverse.org/blog/2025/01/joining-ggplot2/ Thu, 09 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/joining-ggplot2/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>Hello there! I&rsquo;ve been working on ggplot2 for a while now, and I&rsquo;d like to tell you how that came about and what it is like.</p> <h2 id="how-i-got-involved">How I got involved <a href="#how-i-got-involved"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>My journey into learning R started in 2017 during an internship at the EMBL-EBI. The main gripe about base R plotting that drove me into ggplot2&rsquo;s arms were the arcane invocations to get anything else than one of the pre-approved chart types. In contrast, ggplot2 absorbs a bunch of small paper cuts, is very compositional in nature while remaining highly customisable. In a bid to &ldquo;learn from the mistakes of others&rdquo; rather than (continue to copiously) make my own, I became active on Stack Overflow answering questions and solving plotting issues. For posterity: this was in the days before you could ask an large language model for personalised advice and actual humans were equally frustrated on both sides of the question.</p> <p>I was keeping track of solutions to common problems in a personal cookbook that had its own arcane invocations. To give a bit of flavour: much of the cookbook was about preparing gtables (the data structure that comes out of building a plot) for combining and aligning plots. <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> The cookbook eventually grew into my first ggplot2 extension package: <a href="https://teunbrand.github.io/ggh4x/" target="_blank" rel="noopener">ggh4x</a>. Perhaps that package would be best subtitled: &lsquo;Remedies to my common ggplot2 ailments&rsquo;. It contains a bunch of miscellaneous functions ranging from reorganising facets to putting minor ticks on the axes. The nature of the package was also its downside, as ggh4x lacked any sense of scope (and still does, as befits any first package).</p> <p>Around the time when I was really getting into ggplot extensions, <a href="https://github.com/EvaMaeRey" target="_blank" rel="noopener">Gina Reynolds</a> had started organising a meeting for people who build ggplot2 extensions. It is an interesting place to meet others and hear about their packages and how they face interacting with the ggplot2 extension system. I started attending with some degree of regularity and made a discussion place on GitHub. We now use this for general exchange of ideas, but also package specific issues.</p> <p>Meanwhile, the questions on Stack Overflow kept directing my attention at the ggplot2 issue tracker every once in a while. After lurking in there for a bit, I started my first informal contributions to ggplot2 itself by answering the simple stuff just as I did on Stack Overflow. It may not seem like much of a contribution, but in retrospect, answering issues helps triaging them: it separates those issues that need additional changes in ggplot2 from those that do not. My first &lsquo;proper contribution&rsquo; in the shape of a pull request was in 2020. It replaced 3 lines of code with 2 lines of code to benefit type stability (this was prior to <a href="https://vctrs.r-lib.org/" target="_blank" rel="noopener">vctrs</a>)<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>.</p> <p>In 2022, I commented &ldquo;I&rsquo;d be willing to take a stab at this&rdquo; on an issue proposing a large refactor of the guide system. I like to think it was this precise moment that Thomas, the project lead after having taken over for Hadley, took notice and later invited me to join the team<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>. This new guide system ended up laying the foundation for <a href="https://teunbrand.github.io/legendry/" target="_blank" rel="noopener">legendry</a>, so it wasn&rsquo;t entirely out of unselfish reasons that I volunteered. At any rate, this is a great opportunity to fill big shoes on a major R project, so I&rsquo;m very excited to have joined!</p> <h2 id="becoming-an-insider">Becoming an insider <a href="#becoming-an-insider"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Part of being on the team is straightforward. You triage issues. You fix bugs. You implement new features. At the point that I joined, I had already done these things as an outsider. The only thing that really changes is that you get the keys to the kingdom: you can now close issues and merge pull requests <sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>. You&rsquo;re then trusted to wield this power wisely. You then hope you do.</p> <p>At the time I joined the most active maintainers were Thomas, Claus and Hiroaki. I was surprised to learn that really most communication happens on GitHub and it is all public discussion. Even more abstract coordination that does not neatly fit into a single issue, like preparing a new release, didn&rsquo;t occur behind closed doors. I think what made my introduction to the team more awkward than it needed to be was that GitHub issues is not really a good place for announcements where you can say &lsquo;Hi everyone, this person is on the team now and will be doing stuff in the project&rsquo;. I had interacted with the other active maintainers before, so I wasn&rsquo;t a completely alien actor, but I felt some unclarity lingered longer than it ought have. Perhaps I should more assertively have introduced myself <sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup>.</p> <p>However, by the time posit::conf(2024) was over, I&rsquo;ve met 6 out of the 9 other authors in person. I have more thoughts about conf and my first time in the United States, but it has been amazing to meet all these people in person whose work you&rsquo;ve been admiring for a while!</p> <h2 id="maintaining-ggplot2">Maintaining ggplot2 <a href="#maintaining-ggplot2"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The ggplot2 package has both the blessing and the curse of being a popular package. One the one hand, it is a blessing that people care about the project, post issues that they find and make intermittent contributions. The curse is that it is such a staple in the R ecosystem, that almost any change will inadvertently affect somebody else&rsquo;s code. Not only because ggplot2 is widely used, but also because people have been &hellip;creative&hellip; with how they are using ggplot2. The art of making changes is to largely affect plots in a good way.</p> <p>The first big project I was rummaging through was the guide system I proposed to rewrite. The guide system had never been advertised as an official extension point, but naturally that didn&rsquo;t preclude people from using it as an extension point anyway.<sup id="fnref:6"><a href="#fn:6" class="footnote-ref" role="doc-noteref">6</a></sup> So in addition to rewriting the system, we also had to prevent terribly breaking extensions that relied on the old system. In some cases, this meant sending out PRs to other packages to be compatible with both systems.</p> <p>Having worked through a good number of issues at this point in time, I can see some emergent patterns. Different patterns can be partially explained by different audiences. The regular user wants to be empowered to execute their vision of a plot effectively. Maintainers of extensions would often like things to work consistently or change a very obscure line somewhere that they have identified as blocking a niche use case. Teachers would like their students to get stuck less often, which often involves improving error messages. All in all, there is no shortage of issues to work through.</p> <p>The next big thing we&rsquo;re working on is some practical necromancy in getting themeable aesthetics resurrected, which was <a href="https://www.danaseidel.com/2018-09-01-ATidySummer/" target="_blank" rel="noopener">initiated by Dana Paige Seidel</a> all the way back in 2018! We&rsquo;d like the theme to be a home for more default choices than just non-data elements. Default layer aesthetics are a start, but we plan on putting in default palettes too.</p> <h2 id="a-few-words-of-thanks">A few words of thanks <a href="#a-few-words-of-thanks"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I&rsquo;ve been plucked from a level of relative obscurity &mdash;a package maintainer that has this weird miscellaneous package&mdash; into the path of a flagship R project, for which I&rsquo;m very grateful. First and foremost I&rsquo;m thankful to Thomas Lin Pedersen, who has put me into this position and steers the ggplot2 project. Secondly to Hadley Wickham and the rest of the tidyverse team, who make me feel included; both at conf and during regular meetings<sup id="fnref:7"><a href="#fn:7" class="footnote-ref" role="doc-noteref">7</a></sup>. Thirdly, the co-authors I met during conf: Claus Wilke, for whose workshop I TA&rsquo;d, but also Kara Woo and Winston Chang. Lastly, I&rsquo;d like to thank Posit the company for contracting me to do work I also enjoy as a hobby!</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>Luckily, we don&rsquo;t have to think about this <em>at all</em>, thanks to the <a href="https://patchwork.data-imaginist.com/" target="_blank" rel="noopener">patchwork</a> package! <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>I&rsquo;m omitting here that I also had to write 50 lines of tests for this small change <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>How much this actually reflects any truth is for any of us to guess and for Thomas to know. Later, I learned that this was also <a href="https://www.data-imaginist.com/posts/2016-10-31-becoming-the-intern/" target="_blank" rel="noopener">how Thomas himself was roped into the project</a>! <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>After review though. You&rsquo;re not given <em>that</em> much power! <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:5" role="doc-endnote"> <p>But I&rsquo;m not celebrated for my social graces :) <a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:6" role="doc-endnote"> <p>I don&rsquo;t have a moral high ground here: I was one of the worst offenders! <a href="#fnref:6" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:7" role="doc-endnote"> <p>Mostly for The Golden Hex Sticker though! <a href="#fnref:7" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> tidymodels Internship for 2025 https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/ Wed, 08 Jan 2025 00:00:00 +0000 https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/ <p>We are chuffed once again to offer a summer internship with the tidymodels team.</p> <p>We&rsquo;ve had eight previous summer interns and these led to the creation of a number of new packages: <a href="https://agua.tidymodels.org/" target="_blank" rel="noopener">agua</a>, <a href="https://applicable.tidymodels.org/" target="_blank" rel="noopener">applicable</a>, <a href="https://rstudio.github.io/bundle/" target="_blank" rel="noopener">bundle</a>, <a href="https://butcher.tidymodels.org/" target="_blank" rel="noopener">butcher</a>, <a href="https://shinymodels.tidymodels.org/" target="_blank" rel="noopener">shinymodels</a>, <a href="https://spatialsample.tidymodels.org/" target="_blank" rel="noopener">spatialsample</a>, and <a href="https://stacks.tidymodels.org/" target="_blank" rel="noopener">stacks</a>. Our own <a href="https://www.simonpcouch.com/" target="_blank" rel="noopener">Simon Couch</a> is a former intern who won <a href="https://community.amstat.org/jointscsg-section/awards/john-m-chambers" target="_blank" rel="noopener">an award</a> for his work.</p> <p>This year, the primary focus is on expanding our feature selection capabilities. Some of this will involve new recipe steps and other functions. Towards the end of the internship, there might be time to work on other things, too!</p> <p>To apply, make sure that you have a GitHub handle and follow this link:</p> <p><strong> <a href="https://posit.co/job-detail/?gh_jid=6323043003" target="_blank" rel="noopener"><code>https://posit.co/job-detail/?gh_jid=6323043003</code></a></strong></p> <p>The internship is US-based.</p> <p>If you want to know what the internship is like, a few of our alumni have written about it:</p> <ul> <li> <a href="https://www.alexpghayes.com/post/2018-08-10_a-summer-with-rstudio/" target="_blank" rel="noopener"><em>A summer with RStudio</em> (2018)</a></li> <li> <a href="https://fbchow.rbind.io/2018/07/27/rstudio-summer-internship/" target="_blank" rel="noopener"><em>RStudio Summer Internship</em> (2018)</a></li> <li> <a href="https://education.rstudio.com/blog/2019/12/this-is-not-like-the-others/" target="_blank" rel="noopener"><em>This Is Not Like the Others</em> (2019)</a></li> <li> <a href="https://education.rstudio.com/blog/2020/06/tidymodels-internship/" target="_blank" rel="noopener"><em>Tidymodels Internship</em> (2020)</a></li> <li> <a href="https://www.mm218.dev/posts/2022-08-15-last-summer/" target="_blank" rel="noopener"><em>I know what I did last summer</em> (2022)</a></li> </ul> <p>We can&rsquo;t wait to get started and look forward to reading your applications.</p> S7 0.2.0 https://www.tidyverse.org/blog/2024/11/s7-0-2-0/ Thu, 07 Nov 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/11/s7-0-2-0/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re excited to announce that <a href="https://rconsortium.github.io/S7/" target="_blank" rel="noopener">S7</a> v0.2.0 is now available on CRAN! S7 is a new object-oriented programming (OOP) system designed to supersede both S3 and S4. You might wonder why R needs a new OOP system when we already have two. The reason lies in the history of R&rsquo;s OOP journey: S3 is a simple and effective system for single dispatch, while S4 adds formal class definitions and multiple dispatch, but at the cost of complexity. This has forced developers to choose between the simplicity of S3 and the sophistication of S4.</p> <p>The goal of S7 is to unify the OOP landscape by building on S3&rsquo;s existing dispatch system and incorporating the most useful features of S4 (along with some new ones), all with a simpler syntax. S7&rsquo;s design and implementation have been a collaborative effort by a working group from the <a href="https://www.r-consortium.org" target="_blank" rel="noopener">R Consortium</a>, including representatives from R-Core, Bioconductor, tidyverse/Posit, ROpenSci, and the wider R community. Since S7 builds on S3, it is fully compatible with existing S3-based code. It&rsquo;s also been thoughtfully designed to work with S4, and as we learn more about the challenges of transitioning from S4 to S7, we&rsquo;ll continue to add features to ease this process.</p> <p>Our long-term goal is to include S7 in base R, but for now, you can install it from CRAN:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"S7"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="whats-new-in-the-second-release">What&rsquo;s new in the second release <a href="#whats-new-in-the-second-release"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The second release of S7 brings refinements and bug fixes. Highlights include:</p> <ul> <li>Support for lazy property defaults, making class setup more flexible.</li> <li>Custom property setters now run on object initialization.</li> <li>Significant speed improvements for setting and getting properties with <code>@</code> and <code>@&lt;-</code>.</li> <li>Expanded compatibility with base S3 classes.</li> <li> <a href="https://rconsortium.github.io/S7/reference/convert.html" target="_blank" rel="noopener"><code>convert()</code></a> now provides a default method for transforming a parent class into a subclass.</li> </ul> <p>Additionally, there are numerous bug fixes and quality-of-life improvements, such as better error messages, improved support for base Ops methods, and compatibility improvements for using <code>@</code> in R versions prior to 4.3. You can see a full list of changes in the <a href="https://github.com/RConsortium/S7/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="who-should-use-s7">Who should use S7 <a href="#who-should-use-s7"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>S7 is a great fit for R users who like to try new things but don&rsquo;t need to be the first. It&rsquo;s already used in several CRAN packages, and the tidyverse team is applying it in new projects. While you may still run into a few issues, many early problems have been resolved.</p> <h2 id="usage">Usage <a href="#usage"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rconsortium.github.io/S7/'>S7</a></span><span class='o'>)</span></span></code></pre> </div> <p>Let&rsquo;s dive into the basics of S7. To learn more, check out the package vignettes, including a more detailed introduction in <a href="https://rconsortium.github.io/S7/articles/S7.html" target="_blank" rel="noopener"><code>vignette(&quot;S7&quot;)</code></a>, and coverage of generics and methods in <a href="https://rconsortium.github.io/S7/articles/generics-methods.html" target="_blank" rel="noopener"><code>vignette(&quot;generics-methods&quot;)</code></a>, and classes and objects in <a href="https://rconsortium.github.io/S7/articles/classes-objects.html" target="_blank" rel="noopener"><code>vignette(&quot;classes-objects&quot;)</code></a>.</p> <h3 id="classes-and-objects">Classes and objects <a href="#classes-and-objects"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>S7 classes have formal definitions, specified by <a href="https://rconsortium.github.io/S7/reference/new_class.html" target="_blank" rel="noopener"><code>new_class()</code></a>, which includes a list of properties and an optional validator. For example, the following code creates a <code>Range</code> class with <code>start</code> and <code>end</code> properties, and a validator to ensure that <code>start</code> is always less than <code>end</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>Range</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rconsortium.github.io/S7/reference/new_class.html'>new_class</a></span><span class='o'>(</span><span class='s'>"Range"</span>,</span> <span> properties <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span></span> <span> start <span class='o'>=</span> <span class='nv'>class_double</span>,</span> <span> end <span class='o'>=</span> <span class='nv'>class_double</span></span> <span> <span class='o'>)</span>,</span> <span> validator <span class='o'>=</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>self</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='kr'>if</span> <span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='o'>(</span><span class='nv'>self</span><span class='o'>@</span><span class='nv'>start</span><span class='o'>)</span> <span class='o'>!=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='s'>"@start must be length 1"</span></span> <span> <span class='o'>&#125;</span> <span class='kr'>else</span> <span class='kr'>if</span> <span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/length.html'>length</a></span><span class='o'>(</span><span class='nv'>self</span><span class='o'>@</span><span class='nv'>end</span><span class='o'>)</span> <span class='o'>!=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='s'>"@end must be length 1"</span></span> <span> <span class='o'>&#125;</span> <span class='kr'>else</span> <span class='kr'>if</span> <span class='o'>(</span><span class='nv'>self</span><span class='o'>@</span><span class='nv'>end</span> <span class='o'>&lt;</span> <span class='nv'>self</span><span class='o'>@</span><span class='nv'>start</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='s'>"@end must be greater than or equal to @start"</span></span> <span> <span class='o'>&#125;</span></span> <span> <span class='o'>&#125;</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p> <a href="https://rconsortium.github.io/S7/reference/new_class.html" target="_blank" rel="noopener"><code>new_class()</code></a> returns the class object, which also serves as the constructor to create instances of the class:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'>Range</span><span class='o'>(</span>start <span class='o'>=</span> <span class='m'>1</span>, end <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span><span class='nv'>x</span></span> <span><span class='c'>#&gt; &lt;Range&gt;</span></span> <span><span class='c'>#&gt; @ start: num 1</span></span> <span><span class='c'>#&gt; @ end : num 10</span></span> <span></span></code></pre> </div> <h3 id="properties">Properties <a href="#properties"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The data an object holds are called its <strong>properties</strong>. Use <code>@</code> to get and set properties:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span><span class='o'>@</span><span class='nv'>start</span></span> <span><span class='c'>#&gt; [1] 1</span></span> <span></span><span><span class='nv'>x</span><span class='o'>@</span><span class='nv'>end</span> <span class='o'>&lt;-</span> <span class='m'>20</span></span> <span><span class='nv'>x</span></span> <span><span class='c'>#&gt; &lt;Range&gt;</span></span> <span><span class='c'>#&gt; @ start: num 1</span></span> <span><span class='c'>#&gt; @ end : num 20</span></span> <span></span></code></pre> </div> <p>Properties are automatically validated against the type declared in <a href="https://rconsortium.github.io/S7/reference/new_class.html" target="_blank" rel="noopener"><code>new_class()</code></a> (in this case, <code>double</code>) and checked by the class <strong>validator</strong>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span><span class='o'>@</span><span class='nv'>end</span> <span class='o'>&lt;-</span> <span class='s'>"x"</span></span> <span><span class='c'>#&gt; Error: &lt;Range&gt;@end must be &lt;double&gt;, not &lt;character&gt;</span></span> <span></span><span><span class='nv'>x</span><span class='o'>@</span><span class='nv'>end</span> <span class='o'>&lt;-</span> <span class='o'>-</span><span class='m'>1</span></span> <span><span class='c'>#&gt; Error: &lt;Range&gt; object is invalid:</span></span> <span><span class='c'>#&gt; - @end must be greater than or equal to @start</span></span> <span></span></code></pre> </div> <h3 id="generics-and-methods">Generics and methods <a href="#generics-and-methods"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Like S3 and S4, S7 uses <strong>functional OOP</strong>, where methods belong to <strong>generic</strong> functions, and method calls look like regular function calls: <code>generic(object, arg2, arg3)</code>. A generic uses the types of its arguments to automatically pick the appropriate method implementation.</p> <p>You can create a new generic with <a href="https://rconsortium.github.io/S7/reference/new_generic.html" target="_blank" rel="noopener"><code>new_generic()</code></a>, specifying the arguments to dispatch on:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>inside</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rconsortium.github.io/S7/reference/new_generic.html'>new_generic</a></span><span class='o'>(</span><span class='s'>"inside"</span>, <span class='s'>"x"</span><span class='o'>)</span></span></code></pre> </div> <p>To define a method for a specific class, use <code>method(generic, class) &lt;- implementation</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rconsortium.github.io/S7/reference/method.html'>method</a></span><span class='o'>(</span><span class='nv'>inside</span>, <span class='nv'>Range</span><span class='o'>)</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>y</span> <span class='o'>&gt;=</span> <span class='nv'>x</span><span class='o'>@</span><span class='nv'>start</span> <span class='o'>&amp;</span> <span class='nv'>y</span> <span class='o'>&lt;=</span> <span class='nv'>x</span><span class='o'>@</span><span class='nv'>end</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nf'>inside</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>5</span>, <span class='m'>10</span>, <span class='m'>15</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] FALSE TRUE TRUE TRUE</span></span> <span></span></code></pre> </div> <p>Printing the generic shows its methods:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>inside</span></span> <span><span class='c'>#&gt; &lt;S7_generic&gt; inside(x, ...) with 1 methods:</span></span> <span><span class='c'>#&gt; 1: method(inside, Range)</span></span> <span></span></code></pre> </div> <p>And you can retrieve the method for a specific class:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rconsortium.github.io/S7/reference/method.html'>method</a></span><span class='o'>(</span><span class='nv'>inside</span>, <span class='nv'>Range</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;S7_method&gt; method(inside, Range)</span></span> <span><span class='c'>#&gt; function (x, y) </span></span> <span><span class='c'>#&gt; &#123;</span></span> <span><span class='c'>#&gt; y &gt;= x@start &amp; y &lt;= x@end</span></span> <span><span class='c'>#&gt; &#125;</span></span> <span></span></code></pre> </div> <h2 id="known-limitations">Known limitations <a href="#known-limitations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While we are pleased with S7&rsquo;s design, there are still some limitations:</p> <ul> <li>S7 objects can be serialized to disk (with <a href="https://rdrr.io/r/base/readRDS.html" target="_blank" rel="noopener"><code>saveRDS()</code></a>), but the current implementation saves the entire class specification with each object. This may change in the future.</li> <li>Support for implicit S3 classes <code>&quot;array&quot;</code> and <code>&quot;matrix&quot;</code> is still in development.</li> </ul> <p>We expect the community will uncover more issues as S7 is more widely adopted. If you encounter any problems, please file an issue at <a href="https://github.com/RConsortium/OOP-WG/issues">https://github.com/RConsortium/OOP-WG/issues</a>. We appreciate your feedback in helping us make S7 even better! 😃</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you to all people who have contributed issues, code, and comments to this release:</p> <p> <a href="https://github.com/calderonsamuel" target="_blank" rel="noopener">@calderonsamuel</a>, <a href="https://github.com/Crosita" target="_blank" rel="noopener">@Crosita</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dipterix" target="_blank" rel="noopener">@dipterix</a>, <a href="https://github.com/guslipkin" target="_blank" rel="noopener">@guslipkin</a>, <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/jeffkimbrel" target="_blank" rel="noopener">@jeffkimbrel</a>, <a href="https://github.com/jl5000" target="_blank" rel="noopener">@jl5000</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/jmiahjones" target="_blank" rel="noopener">@jmiahjones</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/jtlandis" target="_blank" rel="noopener">@jtlandis</a>, <a href="https://github.com/lawremi" target="_blank" rel="noopener">@lawremi</a>, <a href="https://github.com/MarcellGranat" target="_blank" rel="noopener">@MarcellGranat</a>, <a href="https://github.com/mikmart" target="_blank" rel="noopener">@mikmart</a>, <a href="https://github.com/mmaechler" target="_blank" rel="noopener">@mmaechler</a>, <a href="https://github.com/mynanshan" target="_blank" rel="noopener">@mynanshan</a>, <a href="https://github.com/rikivillalba" target="_blank" rel="noopener">@rikivillalba</a>, <a href="https://github.com/sjcowtan" target="_blank" rel="noopener">@sjcowtan</a>, <a href="https://github.com/t-kalinowski" target="_blank" rel="noopener">@t-kalinowski</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, and <a href="https://github.com/waynelapierre" target="_blank" rel="noopener">@waynelapierre</a>.</p> WebAssembly roundup part 3: Quarto Live 0.1.1 https://www.tidyverse.org/blog/2024/10/quarto-live-0-1-1/ Tue, 15 Oct 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/10/quarto-live-0-1-1/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the release of <a href="https://r-wasm.github.io/quarto-live/" target="_blank" rel="noopener">Quarto Live</a> 0.1.1. Quarto Live is a new Quarto extension that uses WebAssembly to bring interactive examples and code exercises with custom grading algorithms to your HTML-based output documents, using standard Quarto markdown syntax.</p> <p>Quarto Live adds a <a href="https://codemirror.net/" target="_blank" rel="noopener">CodeMirror</a>-based text editor to your document with automatic theming, syntax highlighting, and auto-complete. The editor executes R code using webR, and even integrates with <a href="https://quarto.org/docs/interactive/ojs/" target="_blank" rel="noopener">Quarto&rsquo;s OJS support</a> so that interactive code cells update reactively with other <code>ojs</code> cells running in the page.</p> <p>This blog post is part 3 of a WebAssembly roundup series, and will discuss Quarto Live&rsquo;s primary features and show some examples of the extension in use. Authors who are creating educational content should find this post particularly interesting, as adding just a little interactivity can go a long way to keep readers engaged. The post contains only static screenshots, but if you&rsquo;d like see Quarto Live in action there are interactive examples throughout its <a href="https://r-wasm.github.io/quarto-live/" target="_blank" rel="noopener">documentation website</a>.</p> <h2 id="getting-started">Getting Started <a href="#getting-started"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You can add the Quarto Live extension to a project by running the following command in a terminal with the current directory of a Quarto project:</p> <pre><code>quarto add r-wasm/quarto-live </code></pre> <p>Then, create a new document with the following template to get up and running using the <code>knitr</code> engine:</p> <!-- This rather strange way of displaying output throughout this post is to avoid invoking knitr with (and shortcodes) in code block output. --> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>--- title: R Example engine: knitr format: live-html --- {{&lt; include ./_extensions/r-wasm/live/_knitr.qmd &gt;}} ## Main Content </code></pre> </div> <p>Once the document has been set up in this way, an R code block can be made into an interactive code block simply by switching <code>{r}</code> to <code>{webr}</code>. In this example, we create an interactive block that plots example data using the ggplot package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>```{webr} #| warning: false library(ggplot2) ggplot(airquality, aes(Temp, Ozone)) + geom_point() + geom_smooth(method = "loess") ``` </code></pre> </div> <p>The resulting rendered document looks like the screenshot below. An editor is inserted into the document in the place of the code block, and an interested reader can use it to modify and re-execute the source code in place.</p> <img src="images/ggplot-1.png" alt="Screenshot showing the result of rendering the example document above. The Quarto Live editor is shown pre-populated with the provided code snippet. An output graphic is shown, showing the result of plotting the airquality dataset using the ggplot2 package."/> <p>Quarto Live executes R code using the <a href="https://evaluate.r-lib.org" target="_blank" rel="noopener">evaluate</a> package, with output rendered using functions from <a href="https://yihui.org/knitr/" target="_blank" rel="noopener">knitr</a>, so the output should be almost identical to output generated by R Markdown or Quarto.</p> <p>The real beauty of this, in my opinion, is that the reader does not have to install any packages, copy and paste source code, switch to an IDE like <a href="https://posit.co/products/open-source/rstudio/" target="_blank" rel="noopener">RStudio</a> or <a href="https://positron.posit.co" target="_blank" rel="noopener">Positron</a>, or deal with myriad other small but fiddly distractions just to experiment with a new piece of code or R package. They can do it, right there, without any context switching.</p> <p>At first you might think of this like cells in a Jupyter notebook. However, to me a notebook feels more like an exploratory environment, whereas a Quarto Live block feels more like published content; it lives somewhere in-between computational notebooks and the static rendered output of literate programming frameworks like R Markdown and Quarto.</p> <h2 id="interactive-exercises">Interactive exercises <a href="#interactive-exercises"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Traditionally, this level of direct interactivity for R code in a rendered document has only been possible though tools that require a server-side component, such as a Jupyter server or a Shiny server using the <a href="https://rstudio.github.io/learnr/" target="_blank" rel="noopener">learnr</a> package to execute code dynamically. This has limited deployment options for educators, particularly those who are restricted in where they can deploy to an institution&rsquo;s own <a href="https://en.wikipedia.org/wiki/Learning_management_system" target="_blank" rel="noopener">learning management system (LMS)</a>, or hindered by the sheer number of clients in the case of extremely large class sizes. The rise of virtual learning over the last few years has exacerbated the problem, tutorials might no longer be in-person in a managed computer lab, but virtually over the internet and on entirely student controlled devices.</p> <p>WebAssembly brings a potential solution to this problem in the form of a universal runtime with minimal dependencies. Using Quarto Live an interactive tutorial can be rendered into static HTML output that&rsquo;s well supported by third party virtual learning environments (when compared to traditional Shiny apps) without ongoing management of a server component.</p> <h3 id="defining-an-exercise">Defining an exercise <a href="#defining-an-exercise"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Here&rsquo;s an example showing how to create an interactive tutorial using Quarto Live. We&rsquo;ll build an exercise with a grading component, so that visitors can get feedback on their responses.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>### Exercise 1 Calculate the average of all the integers in the vector defined as the variable `foo`. ```{webr} #| exercise: ex_1 ______(foo) ``` </code></pre> </div> <p>This will add an interactive code editor to the page, along with some placeholder code. Notice how the placeholder contains a string of six underscore (<code>_</code>) characters. When defining an exercise, Quarto Live will consider six or more underscores as a &ldquo;blank&rdquo; that must be replaced by the learner.</p> <img src="images/blank.png" alt="Screenshot showing the result of rendering the example exercise above. The Quarto Live editor is shown pre-populated with the placeholder code. An error message is shown: Please replace ______ with valid code."/> <p>In the exercise we&rsquo;ve asked about a variable <code>foo</code>, but not created it anywhere yet. Let&rsquo;s fix that by adding a <code>setup</code> block that will always be executed before learner submitted code. This block can appear before or after the one above, the placement does not matter as it&rsquo;s linked to the exercise by it&rsquo;s label.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>```{webr} #| exercise: ex_1 #| setup: true foo <- sample.int(100, 10) ``` </code></pre> </div> <img src="images/sample.gif" alt="Animation showing the result of rendering the example above. The Quarto Live editor is shown pre-populated with the code `foo`. A button labelled 'Run Code' is pressed repeatedly and the output changes each time. The output consists of 10 random integers."/> <p>A <code>check</code> code block defines a grading algorithm, checking submitted code and assigning feedback in the form of a <a href="https://r-wasm.github.io/quarto-live/exercises/grading.html#return-feedback" target="_blank" rel="noopener">feedback list</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>```{webr} #| exercise: ex_1 #| check: true if (identical(.result, mean(foo))) { list(correct = TRUE, message = "Nice work!") } else { list(correct = FALSE, message = "That's incorrect, sorry.") } ``` </code></pre> </div> <img src="images/exercise.png" alt="Screenshots showing the result of rendering the example exercise above. The editor is first shown with correct code. A success message is shown: Nice work!. A second editor shows the incorrect code. A failure message is shown: That's incorrect, sorry. "/> <p>Finally, let&rsquo;s add some solution text. This time we&rsquo;ll use a Quarto fenced block to define the content. We still link this block to our exercise by providing the label we used before, just with a slightly different syntax. When a &ldquo;hint&rdquo; or &ldquo;solution&rdquo; block is added in this way, it is hidden until requested to be revealed by the learner through the Quarto Live exercise editor UI.</p> <pre><code>::: { .solution exercise=&quot;ex_1&quot; } ::: { .callout-tip title=&quot;Solution&quot; collapse=&quot;false&quot;} Here is a possible solution: ```r bar &lt;- mean(foo) #&lt;1&gt; print(bar) #&lt;2&gt; ``` 1. Use the `mean` function with `foo` to calculate the average, store this value as `bar`. 2. Print the value stored in `bar` to the console. ::: ::: </code></pre> <img src="images/solution.png" alt="Screenshot showing the result of adding the solution block above. The solution has been revealed, demonstrating the callout block and code annotation features."/> <p>One really great thing about the way this works is that content is defined using standard Quarto markdown syntax. That means we can take full advantage of all the great features that Quarto provides for describing source code and results. Features like collapsible callout blocks and annotated source code allow us to present hints and solutions in the most effective way for learners.</p> <p>You can read more about exercises and grading, including examples using the existing <a href="https://pkgs.rstudio.com/gradethis/index.html" target="_blank" rel="noopener">gradethis</a> package, in the <a href="https://r-wasm.github.io/quarto-live/exercises/grading.html" target="_blank" rel="noopener">Quarto Live documentation</a>.</p> <h2 id="reactivity-with-ojs">Reactivity with OJS <a href="#reactivity-with-ojs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Quarto Live cells may define or take input from OJS reactive variables in the page, providing a seamless way to create dynamic experiences without requiring the use of R Shiny. It is this technology that powers the grading feature shown in the previous section.</p> <p>In the following example a Quarto Live cell takes input from an OJS variable and defines an output OJS variable. Notice how updates in the Quarto Live cell are propagated to related <code>ojs</code> cells in the page.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>```{ojs} foo = 123; ``` ```{ojs} bar ``` ```{webr} #| input: ['foo'] #| define: ['bar'] bar <- foo ** 2 ``` </code></pre> </div> <img src="images/ojs.gif" alt="Animation showing the result of rendering the example above. The Quarto Live editor and several OJS cells are shown. The code is modified and 'Run Code' is pressed several times. The output OJS cell reactively updates with each execution."/> <p>You can even define a function in R and then invoke it reactively using an <code>ojs</code> cell. JavaScript arguments will be converted into R objects, including transparently handling datasets using webR&rsquo;s generic R object constructor described in the <a href="https://www.tidyverse.org/blog/2024/10/webr-0-4-2/">first post</a> of this blog series.</p> <p>In this example an R function is defined that produces some output using base plotting commands. The function is executed from an <code>ojs</code> cell, reactively in response to a changing OJS input.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>```{webr} #| include: false #| define: draw_hist draw_hist <- function(colour) { hist(rnorm(1000), col = colour) } ``` ```{ojs} //| echo: false viewof colour = Inputs.select( [ 'orangered', 'forestgreen', 'cornflowerblue' ], { label: 'Colour' } ); draw_hist(colour); ``` </code></pre> </div> <img src="images/hist.gif" alt="Animation showing the result of rendering the example above. A histogram is shown below a dropdown options menu offering a selection of colours. As colours are selected, the histogram is redrawn using that colour as a fill."/> <p>We hope this form of reactivity will become a powerful pattern to create rich interactive experiences for readers. For more examples of integration with OJS, take a look at the <a href="https://r-wasm.github.io/quarto-live/interactive/reactivity.html#overview" target="_blank" rel="noopener">penguins dashboard-like plot example</a> and <a href="https://r-wasm.github.io/quarto-live/interactive/dynamic.html" target="_blank" rel="noopener">dynamic exercises</a> in the Quarto Live documentation.</p> <h2 id="displaying-htmlwidgets">Displaying htmlwidgets <a href="#displaying-htmlwidgets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The popular <a href="https://rstudio.github.io/htmltools/" target="_blank" rel="noopener">htmltools</a> and <a href="https://www.htmlwidgets.org/" target="_blank" rel="noopener">htmlwidgets</a> packages bring HTML and JavaScript widgets to R, and thanks to updates in webR such HTML output can also be displayed by Quarto Live. Simply print a HTML object or a widget in a live code block and the result will be dynamically added to the web page.</p> <img src="images/leaflet.png" alt="Animation showing the result of rendering the example above. A histogram is shown below a dropdown options menu offering a selection of colours. As colours are selected, the histogram is redrawn using that colour as a fill."/> <h2 id="one-more-thing">One more thing <a href="#one-more-thing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By the way, everything I&rsquo;ve shown in this blog post also works for Python using the <a href="https://pyodide.org" target="_blank" rel="noopener">Pyodide</a> WebAssembly engine. Pyodide works really well for executing Python code on the web and inspired much of how the webR engine and library works today. Many examples of using Quarto Live to evaluate Python code, including dynamic experiences similar to those shown the previous sections, can be found on the <a href="https://r-wasm.github.io/quarto-live/" target="_blank" rel="noopener">Quarto Live documentation website</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I&rsquo;m excited and fascinated to see Quarto Live start being used by the community to create interactive content for the education space and beyond. The project is still fairly new, but we would already not where we are without the help of early users providing their comments, issues and bug reports. Thank you!</p> <p> <a href="https://github.com/Analect" target="_blank" rel="noopener">@Analect</a>, <a href="https://github.com/andrewpbray" target="_blank" rel="noopener">@andrewpbray</a>, <a href="https://github.com/aneesha" target="_blank" rel="noopener">@aneesha</a>, <a href="https://github.com/arnaud-feldmann" target="_blank" rel="noopener">@arnaud-feldmann</a>, <a href="https://github.com/coatless" target="_blank" rel="noopener">@coatless</a>, <a href="https://github.com/cwickham" target="_blank" rel="noopener">@cwickham</a>, <a href="https://github.com/CyuHat" target="_blank" rel="noopener">@CyuHat</a>, <a href="https://github.com/DrDeception" target="_blank" rel="noopener">@DrDeception</a>, <a href="https://github.com/fcichos" target="_blank" rel="noopener">@fcichos</a>, <a href="https://github.com/joelnitta" target="_blank" rel="noopener">@joelnitta</a>, <a href="https://github.com/joelostblom" target="_blank" rel="noopener">@joelostblom</a>, <a href="https://github.com/kcarnold" target="_blank" rel="noopener">@kcarnold</a>, <a href="https://github.com/michaelplynch" target="_blank" rel="noopener">@michaelplynch</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/Nenuial" target="_blank" rel="noopener">@Nenuial</a>, <a href="https://github.com/rpruim" target="_blank" rel="noopener">@rpruim</a>, <a href="https://github.com/rundel" target="_blank" rel="noopener">@rundel</a>, <a href="https://github.com/ryjohnson09" target="_blank" rel="noopener">@ryjohnson09</a>, and <a href="https://github.com/tmieno2" target="_blank" rel="noopener">@tmieno2</a>.</p> WebAssembly roundup part 2: Shinylive 0.8.0 https://www.tidyverse.org/blog/2024/10/shinylive-0-8-0/ Mon, 14 Oct 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/10/shinylive-0-8-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>One of the most popular uses of webR in a wider project is <a href="https://shinylive.io/r/examples" target="_blank" rel="noopener">Shinylive</a>, a system for deploying Shiny for R or Python apps that run completely in a web browser, without the need for a dedicated Shiny server. Shinylive works by running both the server and client components in the viewer&rsquo;s browser, and the support for running R Shiny apps in this way is provided by webR.</p> <p>Since Shinylive works with both R and Python Shiny apps, the project is released as multiple independent but interconnecting software. The core <a href="https://github.com/posit-dev/shinylive" target="_blank" rel="noopener">Shinylive</a> assets, the <a href="https://github.com/posit-dev/r-shinylive" target="_blank" rel="noopener">R shinylive</a> package, the <a href="https://github.com/posit-dev/py-shinylive" target="_blank" rel="noopener">Python Shinylive</a> package, and the <a href="https://github.com/quarto-ext/shinylive/" target="_blank" rel="noopener">Shinylive Quarto extension</a>. This post will describe some the latest changes in the context of running the R Shinylive package and Quarto extension.</p> <h2 id="shinylive-assets">Shinylive assets <a href="#shinylive-assets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The latest release of the Shinylive assets upgrades the version of webR included to 0.4.2, bringing in the improved packaging and loading performance of R binaries discussed in <a href="../webr-0-4-2/">part 1 of this series</a>. Shinylive now defaults to downloading R packages in the improved <code>.tgz</code> archive format served by the <a href="repo.r-wasm.org">webR default repository</a> and <a href="https://r-universe.dev/" target="_blank" rel="noopener">R-Universe</a>, resulting in a more efficient R package installation and faster start up process.</p> <p>These changes are already making a tangible difference to applications. In a recent meeting of the <a href="https://rconsortium.github.io/submissions-wg/" target="_blank" rel="noopener">R Consortium Submissions Working Group</a>, it was reported that for a complex Shinylive app the overall load time decreased from over a minute to just 15 seconds! <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup></p> <p>The working group is championing improved practices for R-based submissions of clinical trial data to regulatory bodies for review. With their great work and our steady improvements to Shinylive over time, the group now report that they have reached a new milestone in <a href="https://r-consortium.org/posts/using-r-to-submit-research-to-the-fda-pilot-4-successfully-submitted/" target="_blank" rel="noopener">successfully submitting a pilot R Shiny app</a>, featuring a WebAssembly component with Shinylive, to the FDA for review.</p> <p><a href="images/pilot-2.png"> <img src="images/pilot-2.png" alt="Screenshots showing the R Consortium Submissions Working Group Pilot 2 Shinylive app."/> </a></p> <h2 id="r-shinylive-package">R shinylive package <a href="#r-shinylive-package"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="reproducible-data-science-with-binary-bundles">Reproducible data science with binary bundles <a href="#reproducible-data-science-with-binary-bundles"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>A benefit of WebAssembly is that the same binary instructions can be executed on a whole range of machine architectures, from high performance desktop workstations to low-power devices such as mobile phones or tablets. WebAssembly provides a common environment ensuring that each device can reproduce the exact same results, both now and potentially for many years into the future.</p> <p>However, those with experience of building software and documents with long-term reproducibility in mind will know that not only must the exact version of your own software be available, but also packages and system dependencies too. Accurate versioning matters; newer editions of R packages are always being released with modified functionality or features deprecated and perhaps even removed.</p> <p>Previously Shinylive downloaded R packages at runtime from the webR default repository. However, that repository follows CRAN and upgrades packages to the latest version reasonably often. So, to help provide long-lived reproducibility, the latest version of Shinylive now not only deploys your application source but also downloads and bundles as many R package binaries as possible in the exported app.</p> <p>By including WebAssembly R package binaries, a self-contained bundle is created that will never change over time, even as new R package versions are released. Once deployed to a static web service such as GitHub Pages or Netlify you can be confident that your results will be exactly the same now or in many years time &ndash; at least as long as browsers continue to support the WebAssembly standard!</p> <p>With this, it is now also possible to load a complex R Shinylive app from a local web server without any external internet connection. This isn&rsquo;t likely to be that useful for most users, but there are some highly regulated industries and restricted network environments where it becomes a key feature.</p> <h3 id="bundling-webassembly-binaries">Bundling WebAssembly binaries <a href="#bundling-webassembly-binaries"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>By default, R packages installed from CRAN, <a href="https://r-universe.dev/" target="_blank" rel="noopener">R-Universe</a>, or <a href="https://bioconductor.org" target="_blank" rel="noopener">Bioconductor</a> will be downloaded and distributed with your Shinylive application. For CRAN packages, the packages are sourced from the webR default repository. For R-Universe or Bioconductor packages, they are sourced from the WebAssembly binaries provided by R-Universe.</p> <p>Here&rsquo;s an example of what this looks like for a sample Shiny app depending on the dplyr package. Shinylive assets and R package binaries are downloaded and bundled at export time, and the status of each is shown in the output.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">shinylive</span><span class="o">::</span><span class="nf">export</span><span class="p">(</span><span class="s">&#34;app&#34;</span><span class="p">,</span> <span class="s">&#34;site&#34;</span><span class="p">)</span> <span class="c1">#&gt; ℹ Exporting Shiny app from: app</span> <span class="c1">#&gt; → Destination: site</span> <span class="c1">#&gt; [======================================================================] 100%</span> <span class="c1">#&gt; ✔ Copying base Shinylive files [289ms]</span> <span class="c1">#&gt; ✔ Loading metadata database ... done</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Finding R package dependencies ... Done!</span> <span class="c1">#&gt; [=======&gt;--------------------------------------------------------------] 11%</span> <span class="c1">#&gt; trying URL &#39;http://repo.r-wasm.org/bin/emscripten/contrib/4.4/dplyr_1.1.4.tgz&#39;</span> <span class="c1">#&gt; Content type &#39;application/x-tar&#39; length 1063948 bytes (1.0 MB)</span> <span class="c1">#&gt; ==================================================</span> <span class="c1">#&gt; downloaded 1.0 MB</span> <span class="c1">#&gt; [...]</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ✔ Downloading WebAssembly R package binaries to site/shinylive/webr/packages [3.2s]</span> <span class="c1">#&gt; ✔ Writing app metadata to site/shinylive/webr/packages/metadata.rds [14ms]</span> <span class="c1">#&gt; ℹ Wrote site/shinylive/webr/packages/metadata.rds (694 bytes)</span> <span class="c1">#&gt; ✔ Writing site/app.json [17ms]</span> <span class="c1">#&gt; ℹ Wrote site/app.json (1.64K bytes)</span> <span class="c1">#&gt; ✔ Shinylive app export complete.</span> <span class="c1">#&gt; ℹ Run the following in an R session to serve the app:</span> <span class="c1">#&gt; `httpuv::runStaticServer(&#34;site&#34;)`</span> </code></pre></div><p>The shinylive R package will query the currently installed versions of packages on your machine and attempt to download and bundle the same version for WebAssembly. Binaries are considered acceptable if the major and minor version numbers match, and a warning is issued otherwise. This check ensures the resulting behaviour of the exported Shinylive app is as close as possible to the behaviour when running the app in the usual way.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">shinylive</span><span class="o">::</span><span class="nf">export</span><span class="p">(</span><span class="s">&#34;app&#34;</span><span class="p">,</span> <span class="s">&#34;site&#34;</span><span class="p">)</span> <span class="c1">#&gt; [...]</span> <span class="c1">#&gt; Warning message:</span> <span class="c1">#&gt; Package version mismatch for dplyr, ensure the versions below are compatible.</span> <span class="c1">#&gt; ! Installed version: 1.0.9, WebAssembly version: 1.1.4.</span> <span class="c1">#&gt; ℹ Install a package version matching the WebAssembly version to silence this error. </span> </code></pre></div> <h3 id="bundling-custom-r-packages">Bundling custom R packages <a href="#bundling-custom-r-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Using your own custom R packages with webR or Shinylive is also possible, but requires a little extra work. R packages, particularly those that include compiled code, must be processed specially for WebAssembly. This requires an environment with a WebAssembly compiler toolchain such as Emscripten and some set up to organise the cross-compiling of packages using R.</p> <p>The easiest way to get up and running is to <a href="https://ropensci.org/blog/2021/06/22/setup-runiverse/" target="_blank" rel="noopener">create a personal R-Universe repository</a> for your packages. The system will automatically build R package binaries for multiple targets, including WebAssembly, and Shinylive will download these resulting binaries when exporting your app.</p> <p>It&rsquo;s also possible to automatically cross-compile and deploy WebAssembly R package binaries using GitHub Actions. The <a href="https://github.com/r-wasm/actions" target="_blank" rel="noopener">r-wasm/actions</a> repository provides reusable workflows for GitHub Actions, one of which can be used to automatically build WebAssembly R package when a GitHub release is created, attaching the resulting binary to the release. If an R package has been installed directly from GitHub, using a tool such as <a href="https://pak.r-lib.org" target="_blank" rel="noopener">pak</a>, Shinylive will look for binaries attached to a GitHub release for bundling.</p> <p>Finally, building an R package for WebAssembly can be done manually using the <a href="https://r-wasm.github.io/rwasm/" target="_blank" rel="noopener">rwasm</a> package. This is a little more involved, using a combination of the <a href="https://github.com/emscripten-core/emsdk" target="_blank" rel="noopener">Emscripten SDK</a> and the <a href="https://github.com/r-wasm/webr/pkgs/container/webr" target="_blank" rel="noopener">webR Docker container</a> to organise cross-compiling packages with R and manage custom CRAN-like repositories. Shinylive will also bundle WebAssembly binaries for R packages installed from such a custom repository.</p> <h2 id="shinylive-quarto-extension">Shinylive Quarto Extension <a href="#shinylive-quarto-extension"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Shinylive applications may be embedded in a Quarto document using the Shinylive Quarto extension. With the extension active, a Shinylive app can be added by directly including its source code in the document markdown. Under the hood, the extension works by calling out to the export functionality provided by the Shinylive R and Python packages, and so improvements to the exporting process also applies to Shiny apps included in Quarto projects.</p> <pre><code>Lorem ipsum dolor sit amet, consectetur adipiscing elit. ```{shinylive-r} #| standalone: true library(shiny) ui &lt;- [...] server &lt;- [...] shinyApp(ui = ui, server = server) ``` Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </code></pre> <h3 id="embedding-data-files-in-subdirectories">Embedding data files in subdirectories <a href="#embedding-data-files-in-subdirectories"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When a Shiny app has been deployed with Shinylive it does not have direct access to the filesystem on the client device. This is enforced by WebAssembly and the browser for security reasons. As such, additional data must either by downloaded or pre-loaded to a virtual filesystem before the app starts.</p> <p>There are a few ways to do this with a Shinylive app, but when working in a Quarto document things are more constrained. One supported way is to define the content of additional data files inline.</p> <pre><code>```{shinylive-r} #| standalone: true ui &lt;- [...] server &lt;- [...] shinyApp(ui = ui, server = server) ## file: data/example.csv foo,bar,baz 1,2,3 2,4,6 3,6,9 5,10,15 8,16,24 ``` </code></pre> <p>The system has been improved to support adding content to subdirectories, along with the ability to define binary content that has been base64 encoded. Combining this with Garrick Aden-Buie&rsquo;s <a href="https://github.com/gadenbuie/quarto-base64" target="_blank" rel="noopener">quarto-base64</a> extension is a great way to easily include arbitrary data in your Quarto embedded Shinylive apps.</p> <h3 id="quarto-project-wide-shared-assets">Quarto project-wide shared assets <a href="#quarto-project-wide-shared-assets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The latest version of the R shinylive package now checks if the export process is currently running as part of a Quarto render. If so, it uses the <code>QUARTO_PROJECT_DIR</code> environment variable as a hint for where to deploy Shinylive assets and bundled WebAssembly R binaries.</p> <p>With this change it&rsquo;s possible to include multiple Shinylive applications in different documents, sharing their WebAssembly assets across the entire project. This avoids an undesirable situation where the exact same set of fundamental R packages are downloaded and deployed many times to different paths in a Quarto website.</p> <h2 id="using-the-latest-shinylive-asssets">Using the latest Shinylive asssets <a href="#using-the-latest-shinylive-asssets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The Shinylive 0.8.0 assets have been <a href="https://github.com/posit-dev/shinylive/releases/tag/v0.8.0" target="_blank" rel="noopener">released on GitHub</a>. They will automatically be downloaded and used once the latest version of the shinylive R package makes it to CRAN and the package has been updated on your machine.</p> <p>If you&rsquo;d like to get a head start on the latest R shinylive features, you can install the current development version of shinylive directly from GitHub:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='https://pak.r-lib.org/reference/pak.html'>pak</a></span><span class='o'>(</span><span class='s'>"posit-dev/r-shinylive"</span><span class='o'>)</span></span></code></pre> </div> <p>Or, if you prefer, you can stick with the current release version of the shinylive R package and orchestrate it to use the latest version of the assets by setting the environment variable:</p> <pre><code>SHINYLIVE_ASSETS_VERSION=0.8.0 </code></pre> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://github.com/chaehni" target="_blank" rel="noopener">@chaehni</a>, <a href="https://github.com/cpsievert" target="_blank" rel="noopener">@cpsievert</a>, <a href="https://github.com/darrida" target="_blank" rel="noopener">@darrida</a>, <a href="https://github.com/erikhall6373" target="_blank" rel="noopener">@erikhall6373</a>, <a href="https://github.com/gadenbuie" target="_blank" rel="noopener">@gadenbuie</a>, <a href="https://github.com/gschivley" target="_blank" rel="noopener">@gschivley</a>, <a href="https://github.com/helgasoft" target="_blank" rel="noopener">@helgasoft</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/JoaoGarcezAurelio" target="_blank" rel="noopener">@JoaoGarcezAurelio</a>, <a href="https://github.com/jvcasillas" target="_blank" rel="noopener">@jvcasillas</a>, <a href="https://github.com/kv9898" target="_blank" rel="noopener">@kv9898</a>, <a href="https://github.com/next-game-solutions" target="_blank" rel="noopener">@next-game-solutions</a>, <a href="https://github.com/Luke-Symes-Tsy" target="_blank" rel="noopener">@Luke-Symes-Tsy</a>, <a href="https://github.com/maek-ies" target="_blank" rel="noopener">@maek-ies</a>, <a href="https://github.com/pawelru" target="_blank" rel="noopener">@pawelru</a>, <a href="https://github.com/quincountychsmn" target="_blank" rel="noopener">@quincountychsmn</a>, <a href="https://github.com/rmcminds" target="_blank" rel="noopener">@rmcminds</a>, <a href="https://github.com/rbcavanaugh" target="_blank" rel="noopener">@rbcavanaugh</a>, <a href="https://github.com/schloerke" target="_blank" rel="noopener">@schloerke</a>, <a href="https://github.com/StefKirsch" target="_blank" rel="noopener">@StefKirsch</a>, <a href="https://github.com/virtualinertia" target="_blank" rel="noopener">@virtualinertia</a>, and <a href="https://github.com/wch" target="_blank" rel="noopener">@wch</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p><a href="https://rconsortium.github.io/submissions-wg/minutes/2024-08-02/#webassembly">https://rconsortium.github.io/submissions-wg/minutes/2024-08-02/#webassembly</a> <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> WebAssembly roundup part 1: webR 0.4.2 https://www.tidyverse.org/blog/2024/10/webr-0-4-2/ Fri, 11 Oct 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/10/webr-0-4-2/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <!-- Initialise webR in the page --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.css"> <style> .CodeMirror pre { background-color: unset !important; } .btn-webr { background-color: #EEEEEE; border-bottom-left-radius: 0; border-bottom-right-radius: 0; } </style> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/mode/r/r.js"></script> <script type="module"> import { WebR } from 'https://webr.r-wasm.org/v0.4.2/webr.mjs'; globalThis.webR = new WebR(); await globalThis.webR.init(); await webR.FS.mkdir('/persist'); await webR.FS.mount('IDBFS', {}, '/persist'); await webR.FS.syncfs(true); await webR.evalRVoid("webr::shim_install()"); await webR.evalRVoid("webr::global_prompt_install()", { withHandlers: false }); globalThis.webRCodeShelter = await new globalThis.webR.Shelter(); document.querySelectorAll(".btn-webr").forEach((btn) => { btn.innerText = 'Run code'; btn.disabled = false; }); </script> <!-- Add webr engine for knit --> <div class="highlight"> </div> <!-- Custom styles for output --> <div class="highlight"> <style type="text/css"> .output > pre, .output code { background-color: #ffffff !important; margin-top: -17px; border-top-left-radius: 0px; border-top-right-radius: 0px; } .error > pre, .error code { background-color: #fcebeb !important; color: #410E0E !important; } </style> </div> <p>We&rsquo;re totally stoked to announce the release of <a href="https://docs.r-wasm.org/webr/v0.4.2/" target="_blank" rel="noopener">webr</a> 0.4.2!</p> <p>It&rsquo;s been a little while since I&rsquo;ve written about webR here, and a few releases between my last blog post and this one. In this post I&rsquo;ll cover some of the exciting changes to the core webR distribution, and also include some interesting tidbits for JavaScript developers using webR in their own applications. You can see a full list of changes in the <a href="https://github.com/r-wasm/webr/releases" target="_blank" rel="noopener">release notes</a>.</p> <p>This post is the first in an R for WebAssembly roundup series. The next posts will cover updates to Shinylive for R, and introduce new a new Quarto extension that uses the power of webR and WebAssembly to elevate your documents with interactivity.</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2024/10/shinylive-0-8-0/">WebAssembly roundup part 2: Shinylive 0.8.0</a></li> <li> <a href="https://www.tidyverse.org/blog/2024/10/quarto-live-0-1-1/">WebAssembly roundup part 3: Quarto Live 0.1.1</a></li> </ul> <h2 id="supporting-html-and-widget-display">Supporting HTML and widget display <a href="#supporting-html-and-widget-display"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The base R distribution may be run using nothing but a text console, but some additional options can be implemented by frontends to provide system-dependent display of content. Previously, we implemented the <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/file.show.html" target="_blank" rel="noopener"><code>pager</code></a> option so that R&rsquo;s help system can be better displayed within the <a href="https://webr.r-wasm.org/latest/" target="_blank" rel="noopener">webR application</a>. Using the pager we can show R function and package documentation outside of the text console in dedicated tabbed windows.</p> <p>In recent releases of webR we have expanded our support for such display systems, providing an implementation both for the <a href="https://stat.ethz.ch/R-manual/R-devel/library/utils/html/View.html" target="_blank" rel="noopener"><code>View()</code></a> function and the <code>viewer</code> global option used by <a href="https://www.htmlwidgets.org/" target="_blank" rel="noopener">htmlwidgets</a>.</p> <p>This gives us the ability to show a tabular data viewer for <code>data.frame</code>-like R objects and an <code>iframe</code> based HTML content viewer, enabling dynamic web-based output from R packages like <a href="https://rstudio.github.io/leaflet/articles/leaflet.html" target="_blank" rel="noopener"><code>leaflet</code></a> and <a href="https://gt.rstudio.com/" target="_blank" rel="noopener"><code>gt</code></a>.</p> <img src="images/viewer.png" alt="Screenshots of the webR REPL showing a tabular data viewer, an interactive map using the leaflet package, and a HTML table rendered using the gt package."/> <p>The implementation of <code>viewer</code> is fairly general, making use of webR&rsquo;s <a href="https://docs.r-wasm.org/webr/latest/communication.html#output-messages" target="_blank" rel="noopener">output messages</a> mechanism to send the required information to the main JavaScript thread for display. That way, any application using webR may choose to listen for those messages and how to show the resulting content on the page. We&rsquo;ll make use of this in a later post where I introduce using webR to generate dynamic content in a new Quarto extension.</p> <h2 id="improvements-to-the-webr-app-ui">Improvements to the webR app UI <a href="#improvements-to-the-webr-app-ui"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Recent webR releases have also made some other quality of life improvements to the webR app. Some minor improvements include making each UI panel resizeable, and offering <code>.zip</code> download of an entire directory in the Files panel.</p> <img src="images/download.png" alt="Screenshot showing the &apos;Download directory&apos; feature in the webR REPL app."/> <h3 id="r-source-syntax-highlighting-and-parsing">R source syntax highlighting and parsing <a href="#r-source-syntax-highlighting-and-parsing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The webR app&rsquo;s code editor is powered by <a href="https://codemirror.net/" target="_blank" rel="noopener">CodeMirror</a> with R parsing provided by the <a href="https://github.com/TravisYeah/lang-r" target="_blank" rel="noopener">codemirror-lang-r</a> package. CodeMirror&rsquo;s extensibility is excellent, and the library is well suited for integrating into a wider project like this. However, we noticed that the <code>codemirror-lang-r</code> package had a few issues highlighting certain types of R syntax. In particular, in our application highlighting matrix operations such as <code>%*%</code> would crash the parser!</p> <img src="images/parser-crash.png" alt="Screenshots comparing syntax highlighting of R source code before and after the changes discussed above."/> <p>As well as fixing this bug, we&rsquo;ve worked to improve the R parser to better support some other types of R syntax and have <a href="https://github.com/TravisYeah/lezer-r/pull/1" target="_blank" rel="noopener">contributed these changes upstream</a> so as to benefit other users of <code>codemirror-lang-r</code>.</p> <img src="images/parser-new.png" alt="Screenshots comparing syntax highlighting of R source code before and after the changes discussed above."/> <h2 id="webassembly-r-package-binary-format">WebAssembly R package binary format <a href="#webassembly-r-package-binary-format"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One of R&rsquo;s greatest strengths is its vibrant community of R packages and their developers, and so one of the development goals of webR is that packages are downloaded and installed as fast as possible. In the latest release of webR, some joint work with <a href="https://github.com/jeroen" target="_blank" rel="noopener">Jeroen Ooms</a> improving the performance of loading WebAssembly binary R packages has landed.</p> <p>R packages and other filesystem data is efficiently made available to the R WebAssembly process using <a href="https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool" target="_blank" rel="noopener">Emscripten&rsquo;s file packager</a> and the <a href="https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-workerfs" target="_blank" rel="noopener"><code>WORKERFS</code></a> filesystem driver. Previously we used uncompressed filesystem data, with the intention of serving content using <a href="https://en.wikipedia.org/wiki/HTTP_compression" target="_blank" rel="noopener">HTTP compression</a>. However, web services do not always compress files automatically<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, especially if they are large. So, in the latest release of webR, filesystem data may now be mounted from a <code>gzip</code> compressed file<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>, and the base R filesystem is also distributed in compressed form.</p> <p>R package developers might recognise that traditional R package binaries are <em>already</em> produced as a <code>gzip</code> compressed archive. And, as pointed out to me by Jeroen, the format of a <code>.tar</code> archive is very similar to Emscripten&rsquo;s <code>.data</code> files. With some <a href="https://r-wasm.github.io/rwasm/articles/tar-metadata.html" target="_blank" rel="noopener">clever arrangement</a> of R package archive data and Emscripten filesystem metadata, pre-processed WebAssembly R package binaries may now be directly mounted to the virtual filesystem by webR.</p> <p>Mounting R packages in this way is more efficient than installing <code>.tgz</code> archives in the usual manner because the decompression step happens in the browser, rather than using R&rsquo;s slower internal routines, and the <code>WORKERFS</code> filesystem driver also avoids memory copies with the archive files until they are actually opened and read by the WebAssembly R process.</p> <p>Both the <a href="repo.r-wasm.org">webR default repository</a> and <a href="https://r-universe.dev/" target="_blank" rel="noopener">R-Universe</a> now serve binary R packages for WebAssembly in this new format. These packages can be installed and loaded interactively in the webR application, or used as dependencies in a deployed Shinylive for R app. For your own custom R packages, the <a href="https://r-wasm.github.io/rwasm/index.html" target="_blank" rel="noopener">rwasm</a> package can be used to compile WebAssembly binaries using a pre-configured Docker container. However, I&rsquo;d actually recommend <a href="https://ropensci.org/blog/2021/06/22/setup-runiverse/" target="_blank" rel="noopener">creating a personal R-Universe repository</a> for your packages instead, since this will automatically build binaries for multiple targets including WebAssembly.</p> <p>A much simpler but effective change has also been made: R packages listed only as <code>LinkingTo</code> dependencies are no longer downloaded by webR on package installation. These are packages are required for building an R package from source, but <em>not at runtime</em>. The change saves network resources when installing WebAssembly R packages. In one particular worst-case scenario, this change avoided downloading about 100 megabytes of data!</p> <h2 id="virtual-file-system-drivers">Virtual file system drivers <a href="#virtual-file-system-drivers"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A nice side-effect of the work in the previous section is that <a href="https://github.com/r-wasm/webr/issues/328" target="_blank" rel="noopener">mounting filesystem data with <code>WORKERFS</code></a> now also works correctly under Node.js, fixing a fairly painful and long-standing bug for our server-side users of webR.</p> <p>We&rsquo;ve also introduced mounting with Emscripten&rsquo;s <a href="https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-idbfs" target="_blank" rel="noopener"><code>IDBFS</code></a> filesystem driver when running webR in the browser<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>. This driver makes use of the low-level <a href="https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API" target="_blank" rel="noopener">IndexedDB API</a> provided by the JavaScript environment to write virtual filesystem contents to a form of local storage on the device.</p> <p>With this, files that have been written to the virtual filesystem can be persisted over page reloads and automatically made available again to the WebAssembly R process when the page is revisited in the future, without needing to re-download the content.</p> <p>You can try it out right here! Any files written to the <code>/persist</code> directory in the interactive R console below should be persisted. The first time you load this page, the directory will be empty. However, if files are written they will remain available after you refresh the page or revisit in the future.</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-1">Loading webR...</button> <div id="webr-editor-1"></div> <div id="webr-code-output-1"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-1'); const outputDiv = document.getElementById('webr-code-output-1'); const editorDiv = document.getElementById('webr-editor-1'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `install.packages('cli', quiet = TRUE)\n\nfiles <- list.files("/persist")\nfiles\n\nif (length(files) == 0) {\n cli::cli_alert_warning("No files found in '/persist', I'll create one...")\n write.csv(mtcars, "/persist/mtcars.csv")\n} else {\n cli::cli_alert_success("Nice! Some existing files were found in '/persist'.")\n}`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>It should be noted that filesystem data stored in an IndexedDB database can only be accessed within the same <a href="https://developer.mozilla.org/en-US/docs/Glossary/Origin" target="_blank" rel="noopener">origin</a>, essentially across the current web page&rsquo;s domain. Also, browsers may decide the amount of storage space provided, what content is deleted when quotas are reached, and when exactly that deletion occurs. In private browsing mode, for example, data is usually removed when the private session ends.</p> <p>Even with these caveats, I expect developers working with webR will be able to make use of the <code>IDBFS</code> driver to selectively cache content or R packages that are too large to download over the network on every single page load, further improving start up times in their own apps as a result.</p> <h2 id="developing-with-webr">Developing with webR <a href="#developing-with-webr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="deprecating-the-serviceworker-channel">Deprecating the <code>ServiceWorker</code> channel <a href="#deprecating-the-serviceworker-channel"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The <code>ServiceWorker</code> <a href="https://docs.r-wasm.org/webr/v0.3.2/communication.html" target="_blank" rel="noopener">communication channel</a>, a method webR offered to handle message passing between the main browser thread and the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers" target="_blank" rel="noopener">JavaScript Web Worker</a> running the R WebAssembly binary, has been deprecated. The communication channel was originally devised as a way to allow use of webR in cases where the <code>SharedArrayBuffer</code> API is not available. This includes any use of webR with an origin that is not <a href="https://developer.mozilla.org/en-US/docs/Web/API/Window/crossOriginIsolated" target="_blank" rel="noopener">Cross-Origin Isolated</a>, such as when content is hosted by GitHub Pages.</p> <p>The channel was implemented using a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API" target="_blank" rel="noopener">JavaScript Service Worker</a> proxy and synchronous <a href="https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest" target="_blank" rel="noopener">XHR</a> requests. Unfortunately, with the overhead of message serialisation and capturing network requests, performance was significantly impacted. The channel was also not compatible with applications that make use of a service worker for genuine network proxy functionality, such as Shinylive.</p> <p>An alternative method has since been developed in the form of the <code>PostMessage</code> communication channel. This instead uses the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Worker/postMessage" target="_blank" rel="noopener">JavaScript <code>PostMessage</code> API</a>, which is designed to handle communication between workers efficiently. It has much better performance and even provides a way to <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects" target="_blank" rel="noopener">transfer objects</a> using zero-copy operations. There are some minor downsides when using the <code>PostMessage</code> channel, mostly related to taking input using tools like <a href="https://rdrr.io/r/base/readline.html" target="_blank" rel="noopener"><code>readline()</code></a>, or nested REPLs like R&rsquo;s <a href="https://rdrr.io/r/base/browser.html" target="_blank" rel="noopener"><code>browser()</code></a>, but for most applications we find that this is not catastrophic and a reasonable price to pay for what is intended as a fallback method.</p> <p>If you are working on a webR application where <a href="https://rdrr.io/r/base/readline.html" target="_blank" rel="noopener"><code>readline()</code></a> functionality <em>is absolutely</em> required, but you cannot set your web server headers to enable cross-origin isolation, an alternative implementation of using a service worker to solve the problem can be found with the <a href="https://github.com/gzuidhof/coi-serviceworker" target="_blank" rel="noopener">coi-serviceworker</a> package. When enabled, the web page will appear to webR to be cross-origin isolated and so <code>SharedArrayBuffer</code> can be used. This still has the other drawbacks of requiring a service worker, but will have much better performance than using webR&rsquo;s <code>ServiceWorker</code> channel directly.</p> <p>For these reasons, the <code>PostMessage</code> communication channel is now the default fallback when the web page is not cross-origin isolated. The <code>ServiceWorker</code> channel will continue to be available in the short-term, if explicitly requested, but will eventually be removed in a future version of webR.</p> <h3 id="api-additions">API additions <a href="#api-additions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve made some minor changes to the webR JavaScript API. There&rsquo;s nothing ground breaking here, but some new tools that we hope to be useful.</p> <h5 id="report-current-version">Report current version <a href="#report-current-version"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h5><p>With the aim of providing functionality similar to the <a href="https://rdrr.io/r/base/Version.html" target="_blank" rel="noopener"><code>R.Version()</code></a> and <a href="https://rdrr.io/r/utils/packageDescription.html" target="_blank" rel="noopener"><code>packageVersion()</code></a> R functions, the version of the currently running webR session may now be obtained from the JavaScript environment.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="kr">const</span> <span class="nx">webR</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">WebR</span><span class="p">();</span> <span class="o">&gt;</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">version</span><span class="p">;</span> <span class="c1">// &#39;0.4.3-dev+d1fb4f4&#39; </span></code></pre></div> <h5 id="discover-an-objects-class">Discover an object&rsquo;s class <a href="#discover-an-objects-class"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h5><p>An R object&rsquo;s <a href="https://rdrr.io/r/base/class.html" target="_blank" rel="noopener"><code>class()</code></a> may be inspected from an <code>RObject</code> proxy. The returned value is an <code>RCharacter</code> vector of classes from which the object inherits.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s2">&#34;mtcars&#34;</span><span class="p">)</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">obj</span> <span class="p">=&gt;</span> <span class="nx">obj</span><span class="p">.</span><span class="kr">class</span><span class="p">())</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">cls</span> <span class="p">=&gt;</span> <span class="nx">cls</span><span class="p">.</span><span class="nx">toArray</span><span class="p">());</span> <span class="c1">// [&#39;data.frame&#39;] </span></code></pre></div> <h5 id="explicitly-construct-an-r-dataframe">Explicitly construct an R <code>data.frame</code> <a href="#explicitly-construct-an-r-dataframe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h5><p>In a previous version of webR, we introduced creating new R <code>data.frame</code> objects from JavaScript using the generic <code>RObject</code> constructor. WebR will build a <code>data.frame</code> for arguments with compatible shape: either an object with named columns, or an array with objects for each row.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">source1</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">abc</span><span class="o">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="nx">xyz</span><span class="o">:</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span> <span class="p">};</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RObject</span><span class="p">(</span><span class="nx">source1</span><span class="p">)</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">obj</span> <span class="p">=&gt;</span> <span class="nx">obj</span><span class="p">.</span><span class="kr">class</span><span class="p">())</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">cls</span> <span class="p">=&gt;</span> <span class="nx">cls</span><span class="p">.</span><span class="nx">toArray</span><span class="p">());</span> <span class="c1">// [&#39;data.frame&#39;] </span><span class="c1"></span> <span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">source2</span> <span class="o">=</span> <span class="p">[</span> <span class="p">{</span> <span class="nx">abc</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">xyz</span><span class="o">:</span> <span class="mi">4</span> <span class="p">},</span> <span class="p">{</span> <span class="nx">abc</span><span class="o">:</span> <span class="mi">2</span><span class="p">,</span> <span class="nx">xyz</span><span class="o">:</span> <span class="mi">5</span> <span class="p">},</span> <span class="p">{</span> <span class="nx">abc</span><span class="o">:</span> <span class="mi">3</span><span class="p">,</span> <span class="nx">xyz</span><span class="o">:</span> <span class="mi">6</span> <span class="p">}];</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RObject</span><span class="p">(</span><span class="nx">source2</span><span class="p">)</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">obj</span> <span class="p">=&gt;</span> <span class="nx">obj</span><span class="p">.</span><span class="kr">class</span><span class="p">())</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">cls</span> <span class="p">=&gt;</span> <span class="nx">cls</span><span class="p">.</span><span class="nx">toArray</span><span class="p">());</span> <span class="c1">// [&#39;data.frame&#39;] </span></code></pre></div><p>You might ask why not create an R list object by default? The reason is that we expect a common situation to be taking datasets defined in the JavaScript environment and processing them using R. With <code>data.frame</code> as the default, JavaScript objects that have been formatted for use with existing JavaScript frameworks can be almost transparently passed to R.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="nx">penguins</span><span class="p">;</span> <span class="c1">// Array(344) [ </span><span class="c1">// 0: { species: &#39;Adelie&#39;, island: &#39;Torgersen&#39;, flipper_length_mm: 181, ... } </span><span class="c1">// ... more </span><span class="c1">// ] </span><span class="c1"></span><span class="o">&gt;</span> <span class="kr">const</span> <span class="nx">sample_mass</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">` </span><span class="sb"> \\(x) x |&gt; dplyr::sample_n(5) |&gt; dplyr::pull(&#34;body_mass_g&#34;) </span><span class="sb"> `</span><span class="p">);</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="nx">sample_mass</span><span class="p">(</span><span class="nx">penguins</span><span class="p">);</span> <span class="c1">// { type: &#39;double&#39;, names: null, values: [3300, 3250, 4000, 4700, 3750] } </span></code></pre></div><p>The generic constructor throws an exception for JavaScript objects that cannot be coerced as a <code>data.frame</code>. If you&rsquo;d prefer to create an R list, you must instead be explicit by using the <code>RList</code> constructor,</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">source</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">def</span><span class="o">:</span> <span class="p">[</span><span class="mi">123</span><span class="p">,</span> <span class="mi">456</span><span class="p">],</span> <span class="nx">uvw</span><span class="o">:</span> <span class="s1">&#39;hello&#39;</span> <span class="p">};</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RObject</span><span class="p">(</span><span class="nx">source</span><span class="p">);</span> <span class="c1">// Uncaught WebRWorkerError: Can&#39;t construct `data.frame`. Source object is not eligible. </span><span class="c1"></span> <span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">obj</span> <span class="o">=</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RList</span><span class="p">(</span><span class="nx">source</span><span class="p">);</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">type</span><span class="p">();</span> <span class="c1">// &#39;list&#39; </span></code></pre></div><p>The <code>RObject</code> constructor is designed to be a useful default for interactive work at a JavaScript console. However, production applications should be <a href="https://docs.r-wasm.org/webr/latest/api/js/modules/RWorker.html#classes" target="_blank" rel="noopener">explicit in the choice of constructor</a>. With this in mind we have added a new class <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/RWorker.RDataFrame.html" target="_blank" rel="noopener"><code>RDataFrame</code></a>, a subclass of <code>RList</code>, so that users may be explicit in their choice of creating a <code>data.frame</code>, rather than relying on the generic <code>RObject</code> constructor.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">source</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">abc</span><span class="o">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="nx">xyz</span><span class="o">:</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span> <span class="p">};</span> <span class="o">&gt;</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RDataFrame</span><span class="p">(</span><span class="nx">source</span><span class="p">)</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">obj</span> <span class="p">=&gt;</span> <span class="nx">obj</span><span class="p">.</span><span class="kr">class</span><span class="p">())</span> <span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">cls</span> <span class="p">=&gt;</span> <span class="nx">cls</span><span class="p">.</span><span class="nx">toArray</span><span class="p">());</span> <span class="c1">// [&#39;data.frame&#39;] </span></code></pre></div><p>Now, if your source object is not quite as you expect, rather than continuing silently without error an exception will be thrown. We hope this will reduce the chance of type-related bugs and unexpected behaviour, and aid in debugging when issues do occur.</p> <div class="highlight"><pre class="chroma"><code class="language-js" data-lang="js"><span class="c1">// Say we _expect_ a JS object here, but something went wrong... </span><span class="c1"></span><span class="o">&gt;</span> <span class="kd">let</span> <span class="nx">bug</span> <span class="o">=</span> <span class="kc">undefined</span><span class="p">;</span> <span class="o">&gt;</span> <span class="kr">const</span> <span class="nx">obj1</span> <span class="o">=</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RObject</span><span class="p">(</span><span class="nx">bug</span><span class="p">);</span> <span class="c1">// [No error and webR silently continues with an unexpected R object] </span><span class="c1"></span> <span class="o">&gt;</span> <span class="kr">const</span> <span class="nx">obj2</span> <span class="o">=</span> <span class="nx">await</span> <span class="k">new</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">RDataFrame</span><span class="p">(</span><span class="nx">bug</span><span class="p">);</span> <span class="c1">// Uncaught WebRWorkerError: Can&#39;t construct `data.frame`. Source object is not eligible. </span></code></pre></div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Special thanks to <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, for helpful conversations when it comes to packaging for webR. And thank you, as always, to the users and developers contributing to webR in the form of discussion, bug reports, and pull requests.</p> <p> <a href="https://github.com/027xiguapi" target="_blank" rel="noopener">@027xiguapi</a>, <a href="https://github.com/adrianolszewski" target="_blank" rel="noopener">@adrianolszewski</a>, <a href="https://github.com/alekrutkowski" target="_blank" rel="noopener">@alekrutkowski</a>, <a href="https://github.com/andrjohns" target="_blank" rel="noopener">@andrjohns</a>, <a href="https://github.com/baogorek" target="_blank" rel="noopener">@baogorek</a>, <a href="https://github.com/bugzpodder" target="_blank" rel="noopener">@bugzpodder</a>, <a href="https://github.com/christianp" target="_blank" rel="noopener">@christianp</a>, <a href="https://github.com/coatless" target="_blank" rel="noopener">@coatless</a>, <a href="https://github.com/codingthemystery" target="_blank" rel="noopener">@codingthemystery</a>, <a href="https://github.com/ColinFay" target="_blank" rel="noopener">@ColinFay</a>, <a href="https://github.com/derrickstaten" target="_blank" rel="noopener">@derrickstaten</a>, <a href="https://github.com/dipterix" target="_blank" rel="noopener">@dipterix</a>, <a href="https://github.com/EduardBel" target="_blank" rel="noopener">@EduardBel</a>, <a href="https://github.com/gregvolny" target="_blank" rel="noopener">@gregvolny</a>, <a href="https://github.com/guillaumechaumet" target="_blank" rel="noopener">@guillaumechaumet</a>, <a href="https://github.com/gyanaranjans" target="_blank" rel="noopener">@gyanaranjans</a>, <a href="https://github.com/helgasoft" target="_blank" rel="noopener">@helgasoft</a>, <a href="https://github.com/HenrikBengtsson" target="_blank" rel="noopener">@HenrikBengtsson</a>, <a href="https://github.com/isbool" target="_blank" rel="noopener">@isbool</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/luisDVA" target="_blank" rel="noopener">@luisDVA</a>, <a href="https://github.com/minhaj57sorder" target="_blank" rel="noopener">@minhaj57sorder</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/oranwutang" target="_blank" rel="noopener">@oranwutang</a>, <a href="https://github.com/pawelru" target="_blank" rel="noopener">@pawelru</a>, <a href="https://github.com/psychemedia" target="_blank" rel="noopener">@psychemedia</a>, <a href="https://github.com/rainer-rq-koelle" target="_blank" rel="noopener">@rainer-rq-koelle</a>, <a href="https://github.com/richarddmorey" target="_blank" rel="noopener">@richarddmorey</a>, <a href="https://github.com/richardjtelford" target="_blank" rel="noopener">@richardjtelford</a>, <a href="https://github.com/seanbirchall" target="_blank" rel="noopener">@seanbirchall</a>, <a href="https://github.com/shalom-lab" target="_blank" rel="noopener">@shalom-lab</a>, <a href="https://github.com/StaffanBetner" target="_blank" rel="noopener">@StaffanBetner</a>, <a href="https://github.com/stobor827" target="_blank" rel="noopener">@stobor827</a>, <a href="https://github.com/SugarRayLua" target="_blank" rel="noopener">@SugarRayLua</a>, <a href="https://github.com/tavosansal" target="_blank" rel="noopener">@tavosansal</a>, <a href="https://github.com/thomascwells" target="_blank" rel="noopener">@thomascwells</a>, <a href="https://github.com/timelyportfolio" target="_blank" rel="noopener">@timelyportfolio</a>, and <a href="https://github.com/zpinocchio" target="_blank" rel="noopener">@zpinocchio</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>It depends a lot on how the hosting service has configured their production web server and the files themselves; both size and content type can make a difference to behaviour. Some services allow for pre-compressed content, while others do not. The <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html" target="_blank" rel="noopener">AWS CloudFront documentation</a> gives a good overview of how this all fits together. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Emscripten&rsquo;s <code>file_packager</code> tool also supports built-in <code>LZ4</code> compression with the <code>--lz4</code> flag. While generally useful for bundling files for WebAssembly applications, we avoid using this feature since it writes important data to a <code>.js</code> output file that must be executed. Ideally, we&rsquo;d prefer our package loading mechanism to only require a single file download, similar to traditional R package archives. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>Note that currently users wanting to make use of <code>IDBFS</code> mounting must configure webR to use the <a href="https://docs.r-wasm.org/webr/latest/communication.html" target="_blank" rel="noopener"><code>PostMessage</code> Communication Channel</a>. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Postprocessing is coming to tidymodels https://www.tidyverse.org/blog/2024/10/postprocessing-preview/ Tue, 08 Oct 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/10/postprocessing-preview/ <p>We&rsquo;re bristling with elation to share about a set of upcoming features for postprocessing with tidymodels. Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations. The developmental versions of many tidymodels core packages include changes to support postprocessors, and we&rsquo;re ready to share about our work and hear the community&rsquo;s thoughts on our progress so far.</p> <p>Postprocessing support with tidymodels hasn&rsquo;t yet made it to CRAN, but you can install the needed versions of tidymodels packages with the following code.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='https://pak.r-lib.org/reference/pak.html'>pak</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span></span> <span> <span class='s'>"tidymodels/"</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"tune"</span>, <span class='s'>"workflows"</span>, <span class='s'>"rsample"</span>, <span class='s'>"tailor"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p>Now, we load packages with those developmental versions installed.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/probably'>probably</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/tailor'>tailor</a></span><span class='o'>)</span></span></code></pre> </div> <p>Existing tidymodels users might have spotted something funky already; who is this tailor character?</p> <h2 id="meet-tailor">Meet tailor👋 <a href="#meet-tailor"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The tailor package introduces tailor objects, which compose iterative adjustments to model predictions. tailor is to postprocessing as recipes is to preprocessing; applying your mental model of recipes to tailor should get you a good bit of the way there.</p> <div style="width: 140%; max-width: 140%; overflow-x: auto;"> <table> <thead> <tr> <th>Tool</th> <th>Applied to...</th> <th>Initialize with...</th> <th>Composes...</th> <th>Train with...</th> <th>Predict with...</th> </tr> </thead> <tbody> <tr> <td>recipes</td> <td>Training data</td> <td><code>recipe()</code></td> <td><code>step_*()</code>s</td> <td><code>prep()</code></td> <td><code>bake()</code></td> </tr> <tr> <td>tailor</td> <td>Model predictions</td> <td> <a href="https://tailor.tidymodels.org/reference/tailor.html" target="_blank" rel="noopener"><code>tailor()</code></a></td> <td><code>adjust_*()</code>ments</td> <td> <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a></td> <td> <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a></td> </tr> </tbody> </table> </div> <p>First, users can initialize a tailor object with <a href="https://tailor.tidymodels.org/reference/tailor.html" target="_blank" rel="noopener"><code>tailor()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; A postprocessor with 0 adjustments.</span></span> <span></span></code></pre> </div> <p>Tailors compose &ldquo;adjustments,&rdquo; analogous to steps from the recipes package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_probability_threshold.html'>adjust_probability_threshold</a></span><span class='o'>(</span>threshold <span class='o'>=</span> <span class='m'>.7</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7.</span></span> <span></span></code></pre> </div> <p>As an example, we&rsquo;ll apply this tailor to the <code>two_class_example</code> data made available after loading tidymodels.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='nv'>two_class_example</span><span class='o'>)</span></span> <span><span class='c'>#&gt; truth Class1 Class2 predicted</span></span> <span><span class='c'>#&gt; 1 Class2 0.003589243 0.9964107574 Class2</span></span> <span><span class='c'>#&gt; 2 Class1 0.678621054 0.3213789460 Class1</span></span> <span><span class='c'>#&gt; 3 Class2 0.110893522 0.8891064779 Class2</span></span> <span><span class='c'>#&gt; 4 Class1 0.735161703 0.2648382969 Class1</span></span> <span><span class='c'>#&gt; 5 Class2 0.016239960 0.9837600397 Class2</span></span> <span><span class='c'>#&gt; 6 Class1 0.999275071 0.0007249286 Class1</span></span> <span></span></code></pre> </div> <p>This data gives the true value of an outcome variable <code>truth</code> as well as predicted probabilities (<code>Class1</code> and <code>Class2</code>). The hard class predictions, in <code>predicted</code>, are <code>&quot;Class1&quot;</code> if the probability assigned to <code>&quot;Class1&quot;</code> is above .5, and <code>&quot;Class2&quot;</code> otherwise.</p> <p>The model predicts <code>&quot;Class1&quot;</code> more often than it does <code>&quot;Class2&quot;</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>two_class_example</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>count</span><span class='o'>(</span><span class='nv'>predicted</span><span class='o'>)</span></span> <span><span class='c'>#&gt; predicted n</span></span> <span><span class='c'>#&gt; 1 Class1 277</span></span> <span><span class='c'>#&gt; 2 Class2 223</span></span> <span></span></code></pre> </div> <p>If we wanted the model to predict <code>&quot;Class2&quot;</code> more often, we could increase the probability threshold assigned to <code>&quot;Class1&quot;</code> above which the hard class prediction will be <code>&quot;Class1&quot;</code>. In the tailor package, this adjustment is implemented in <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener"><code>adjust_probability_threshold()</code></a>, which can be situated in a tailor object.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tlr</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_probability_threshold.html'>adjust_probability_threshold</a></span><span class='o'>(</span>threshold <span class='o'>=</span> <span class='m'>.7</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>tlr</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7.</span></span> <span></span></code></pre> </div> <p>tailors must be fitted before they can predict on new data. For adjustments like <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener"><code>adjust_probability_threshold()</code></a>, there&rsquo;s no training that actually happens at the <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a> step besides recording the name and type of relevant variables. For other adjustments, like numeric calibration with <a href="https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html" target="_blank" rel="noopener"><code>adjust_numeric_calibration()</code></a>, parameters are actually estimated at the <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a> stage and separate data should be used to train the postprocessor and evaluate its performance. More on this in <a href="#tailors-in-context">Tailors in context</a>.</p> <p>In this case, though, we can <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a> on the whole dataset. The resulting object is still a tailor, but is now flagged as trained.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tlr_trained</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span></span> <span> <span class='nv'>tlr</span>,</span> <span> <span class='nv'>two_class_example</span>,</span> <span> outcome <span class='o'>=</span> <span class='nv'>truth</span>,</span> <span> estimate <span class='o'>=</span> <span class='nv'>predicted</span>,</span> <span> probabilities <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>Class1</span>, <span class='nv'>Class2</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>tlr_trained</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>tailor</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; A binary postprocessor with 1 adjustment:</span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Adjust probability threshold to 0.7. [trained]</span></span> <span></span></code></pre> </div> <p>When used with a model <a href="https://workflows.tidymodels.org" target="_blank" rel="noopener">workflow</a> via <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html" target="_blank" rel="noopener"><code>add_tailor()</code></a>, the arguments to <a href="https://generics.r-lib.org/reference/fit.html" target="_blank" rel="noopener"><code>fit()</code></a> a tailor will be set automatically. Generally, as in recipes, we recommend that users add tailors to model workflows for training and prediction rather than using them standalone for greater ease of use and to prevent data leakage, but tailors are totally functional by themselves, too.</p> <p>Now, when passed new data, the trained tailor will determine the outputted class based on whether the probability assigned to the level <code>&quot;Class1&quot;</code> is above <code>.7</code>, resulting in more predictions of <code>&quot;Class2&quot;</code> than before.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>tlr_trained</span>, <span class='nv'>two_class_example</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>count</span><span class='o'>(</span><span class='nv'>predicted</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; predicted n</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Class1 236</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Class2 264</span></span> <span></span></code></pre> </div> <p>Changing the probability threshold is one of many possible adjustments available in tailor.</p> <ul> <li>For probabilities: <a href="https://tailor.tidymodels.org/reference/adjust_probability_calibration.html" target="_blank" rel="noopener">calibration</a></li> <li>For transformation of probabilities to hard class predictions: <a href="https://tailor.tidymodels.org/reference/adjust_probability_threshold.html" target="_blank" rel="noopener">thresholds</a>, <a href="https://tailor.tidymodels.org/reference/adjust_equivocal_zone.html" target="_blank" rel="noopener">equivocal zones</a></li> <li>For numeric outcomes: <a href="https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html" target="_blank" rel="noopener">calibration</a>, <a href="https://tailor.tidymodels.org/reference/adjust_numeric_range.html" target="_blank" rel="noopener">range</a></li> </ul> <p>Support for tailors is now plumbed through workflows (via <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html" target="_blank" rel="noopener"><code>add_tailor()</code></a>) and tune, and rsample includes a set of infrastructural changes to prevent data leakage behind the scenes. That said, we haven&rsquo;t yet implemented support for tuning parameters in tailors, but we plan to implement that before this functionality heads to CRAN.</p> <h2 id="tailors-in-context">Tailors in context <a href="#tailors-in-context"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As an example, let&rsquo;s model a study of food delivery times in minutes (i.e., the time from the initial order to receiving the food) for a single restaurant. The <code>deliveries</code> data is available upon loading the tidymodels meta-package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>deliveries</span><span class='o'>)</span></span> <span></span> <span><span class='c'># split into training and testing sets</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>delivery_split</span> <span class='o'>&lt;-</span> <span class='nf'>initial_split</span><span class='o'>(</span><span class='nv'>deliveries</span><span class='o'>)</span></span> <span><span class='nv'>delivery_train</span> <span class='o'>&lt;-</span> <span class='nf'>training</span><span class='o'>(</span><span class='nv'>delivery_split</span><span class='o'>)</span></span> <span><span class='nv'>delivery_test</span> <span class='o'>&lt;-</span> <span class='nf'>testing</span><span class='o'>(</span><span class='nv'>delivery_split</span><span class='o'>)</span></span> <span></span> <span><span class='c'># resample the training set using 10-fold cross-validation</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>delivery_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>delivery_train</span><span class='o'>)</span></span> <span></span> <span><span class='c'># print out the training set</span></span> <span><span class='nv'>delivery_train</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7,509 × 31</span></span></span> <span><span class='c'>#&gt; time_to_delivery hour day distance item_01 item_02 item_03 item_04 item_05</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 21.2 16.1 Tue 3.02 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 17.9 12.4 Sun 3.37 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 22.4 14.2 Fri 2.59 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 30.9 19.1 Sat 2.77 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 30.1 16.5 Fri 2.05 0 0 0 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 35.3 14.7 Sat 4.57 0 0 2 1 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 13.1 11.5 Sat 2.09 0 0 0 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 18.3 13.4 Tue 2.35 0 2 1 0 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 25.2 20.5 Sat 2.43 0 0 0 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 30.7 16.7 Fri 2.24 0 0 0 1 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 7,499 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more variables: item_06 &lt;int&gt;, item_07 &lt;int&gt;, item_08 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_09 &lt;int&gt;, item_10 &lt;int&gt;, item_11 &lt;int&gt;, item_12 &lt;int&gt;, item_13 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_14 &lt;int&gt;, item_15 &lt;int&gt;, item_16 &lt;int&gt;, item_17 &lt;int&gt;, item_18 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_19 &lt;int&gt;, item_20 &lt;int&gt;, item_21 &lt;int&gt;, item_22 &lt;int&gt;, item_23 &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># item_24 &lt;int&gt;, item_25 &lt;int&gt;, item_26 &lt;int&gt;, item_27 &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>Let&rsquo;s deliberately define a regression model that has poor predicted values: a boosted tree with only three ensemble members.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>delivery_wflow</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'>workflow</span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>add_formula</span><span class='o'>(</span><span class='nv'>time_to_delivery</span> <span class='o'>~</span> <span class='nv'>.</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>add_model</span><span class='o'>(</span><span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, trees <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>Evaluating against resamples:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>delivery_res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>fit_resamples</span><span class='o'>(</span></span> <span> <span class='nv'>delivery_wflow</span>, </span> <span> <span class='nv'>delivery_folds</span>, </span> <span> control <span class='o'>=</span> <span class='nf'>control_resamples</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>The $R^2$ looks quite strong!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_metrics</a></span><span class='o'>(</span><span class='nv'>delivery_res</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 6</span></span></span> <span><span class='c'>#&gt; .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> rmse standard 9.52 10 0.053<span style='text-decoration: underline;'>3</span> Preprocessor1_Model1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> rsq standard 0.853 10 0.003<span style='text-decoration: underline;'>57</span> Preprocessor1_Model1</span></span> <span></span></code></pre> </div> <p>Let&rsquo;s take a closer look at the predictions, though. How well are they calibrated? We can use the <a href="https://probably.tidymodels.org/reference/cal_plot_regression.html" target="_blank" rel="noopener"><code>cal_plot_regression()</code></a> helper from the probably package to put together a quick diagnostic plot.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_predictions</a></span><span class='o'>(</span><span class='nv'>delivery_res</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_regression.html'>cal_plot_regression</a></span><span class='o'>(</span>truth <span class='o'>=</span> <span class='nv'>time_to_delivery</span>, estimate <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/predictions-bad-boost-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Ooof.</p> <p>In comes tailor! Numeric calibration can help address the correlated errors here. We can add a tailor to our existing workflow to &ldquo;bump up&rdquo; predictions towards their true value.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>delivery_wflow_improved</span> <span class='o'>&lt;-</span></span> <span> <span class='nv'>delivery_wflow</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>add_tailor</span><span class='o'>(</span><span class='nf'><a href='https://tailor.tidymodels.org/reference/tailor.html'>tailor</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tailor.tidymodels.org/reference/adjust_numeric_calibration.html'>adjust_numeric_calibration</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>The resampling code looks the same from here.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>delivery_res_improved</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>fit_resamples</span><span class='o'>(</span></span> <span> <span class='nv'>delivery_wflow_improved</span>, </span> <span> <span class='nv'>delivery_folds</span>, </span> <span> control <span class='o'>=</span> <span class='nf'>control_resamples</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>Checking out the same plot reveals a much better fit!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_predictions</a></span><span class='o'>(</span><span class='nv'>delivery_res_improved</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_regression.html'>cal_plot_regression</a></span><span class='o'>(</span>truth <span class='o'>=</span> <span class='nv'>time_to_delivery</span>, estimate <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/predictios-better-boost-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>There&rsquo;s actually some tricky data leakage prevention happening under the hood here. When you add tailors to workflow and fit them with tune, this is all taken care of for you. If you&rsquo;re interested in using tailors outside of that context, check out <a href="https://workflows.tidymodels.org/dev/reference/add_tailor.html#data-usage" target="_blank" rel="noopener">this documentation section</a> in <code>add_tailor()</code>.</p> <h2 id="whats-to-come">What&rsquo;s to come <a href="#whats-to-come"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;re excited about how this work is shaping up and would love to hear yall&rsquo;s thoughts on what we&rsquo;ve brought together so far. Please do comment on our social media posts about this blog entry or leave issues on the <a href="https://github.com/tidymodels/tailor" target="_blank" rel="noopener">tailor GitHub repository</a> and let us know what you think!</p> <p>Before these changes head out to CRAN, we&rsquo;ll also be implementing tuning functionality for postprocessors. You&rsquo;ll be able to tag arguments like <code>adjust_probability_threshold(threshold)</code> or <code>adjust_probability_calibration(method)</code> with <code>tune()</code> to optimize across several values. Besides that, post-processing with tidymodels should &ldquo;just work&rdquo; on the developmental versions of our packages&mdash;let us know if you come across anything wonky.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Postprocessing support has been a longstanding feature request across many of our repositories; we&rsquo;re grateful for the community discussions there for shaping this work. Additionally, we thank Ryan Tibshirani and Daniel McDonald for fruitful discussions on how we might scope these features.</p> patchwork 1.3.0 https://www.tidyverse.org/blog/2024/09/patchwork-1-3-0/ Fri, 13 Sep 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/09/patchwork-1-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>I&rsquo;m excited to present <a href="https://patchwork.data-imaginist.com" target="_blank" rel="noopener">patchwork</a> 1.3.0, our package for creating multifigure plot compositions. This versions adds table support and improves support for &ldquo;free&quot;ing components to span across multiple grid cells.</p> <p>You can install patchwork from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"patchwork"</span><span class='o'>)</span></span></code></pre> </div> <p>You can see a full list of changes in the <a href="https://patchwork.data-imaginist.com/news/index.html" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://patchwork.data-imaginist.com'>patchwork</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://gt.rstudio.com'>gt</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="tables-are-figures-too">Tables are figures too <a href="#tables-are-figures-too"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The new and shiny feature of the release is that patchwork now has native support for gt objects, making it possible to compose beautifully formatted tables together with your figures. This has been made possible through Teun Van den Brand&rsquo;s effort to provide grob output to gt. While this means that you can now pass in gt objects to <a href="https://patchwork.data-imaginist.com/reference/wrap_elements.html" target="_blank" rel="noopener"><code>wrap_elements()</code></a> in the same way as other supported data types, it also goes one step further, using the semantics of the table design to add table specific formatting options through the new <a href="https://patchwork.data-imaginist.com/reference/wrap_table.html" target="_blank" rel="noopener"><code>wrap_table()</code></a> function.</p> <p>But let&rsquo;s take a step back and see how the simplest support works in reality:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>Day</span>, y <span class='o'>=</span> <span class='nv'>Temp</span>, colour <span class='o'>=</span> <span class='nv'>month.name</span><span class='o'>[</span><span class='nv'>Month</span><span class='o'>]</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"Month"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>aq</span> <span class='o'>&lt;-</span> <span class='nv'>airquality</span><span class='o'>[</span><span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span>, <span class='m'>10</span><span class='o'>)</span>, <span class='o'>]</span></span> <span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://gt.rstudio.com/reference/gt.html'>gt</a></span><span class='o'>(</span><span class='nv'>aq</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"Sample of the dataset"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>A few things can be gathered already from this small example. Tables can have titles (and subtitles, captions, and tags) like regular plots (in that sense they behave like <a href="https://patchwork.data-imaginist.com/reference/wrap_elements.html" target="_blank" rel="noopener"><code>wrap_elements()</code></a> output). Also, and this is perhaps more interesting, patchwork is aware that the first row is special (a header row), and thus places that on top of the panel area so that the plot region of the left plot is aligned with the body of the table, not the full table.</p> <p>Lastly, we see that tables often have a fixed size, contrary to plots which can shrink and expand based on how much room they have. Because of this, our table is overflowing it&rsquo;s region in the plot above creating a not-so-great look.</p> <p>Let&rsquo;s see how we can use <a href="https://patchwork.data-imaginist.com/reference/wrap_table.html" target="_blank" rel="noopener"><code>wrap_table()</code></a> to control some of these behaviors. First, while we could decrease the font size in the table to make it smaller, we could also allow it some more space instead. We could do this by using <code>plot_layout(widths = ...)</code> but it would require a fair amount of guessing on our side to get it just right. Thankfully, patchwork is smart enough to figure it out for us and we can instruct it to do so using the <code>space</code> argument in <a href="https://patchwork.data-imaginist.com/reference/wrap_table.html" target="_blank" rel="noopener"><code>wrap_table()</code></a>. Setting it to <code>&quot;free_y&quot;</code> instructs it to fix the width to the table width but keep the height free:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>aq</span>, space <span class='o'>=</span> <span class='s'>"free_y"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Setting <code>space</code> to <code>&quot;fixed&quot;</code> would constrain both the width and the height of the area it occupies. Since we only have a single row in our layout this would leave us with some empty horizontal space:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>aq</span>, space <span class='o'>=</span> <span class='s'>"fixed"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>If the space is fixed in the y direction and the table has any source notes or footnotes, these will behave like the column header and be placed outside the panel area depending on the <code>panel</code> setting</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>aq_footer</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://gt.rstudio.com/reference/gt.html'>gt</a></span><span class='o'>(</span><span class='nv'>aq</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gt.rstudio.com/reference/tab_source_note.html'>tab_source_note</a></span><span class='o'>(</span><span class='s'>"This is not part of the table body"</span><span class='o'>)</span></span> <span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>aq_footer</span>, space <span class='o'>=</span> <span class='s'>"fixed"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>While the space argument is great for making the composition look good and the table well placed in the whole, it can also serve a different purpose of making sure that rows (or columns) are aligned with the axis of a plot. There are no facilities to ensure that the breaks order matches between plots and tables so that is the responsibility of the user, but otherwise this is a great way to use tables to directly augment a plot:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>month.name</span><span class='o'>[</span><span class='nv'>Month</span><span class='o'>]</span>, y <span class='o'>=</span> <span class='nv'>Temp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.text.x <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_blank</a></span><span class='o'>(</span><span class='o'>)</span>, axis.title.x <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_blank</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_discrete.html'>scale_x_discrete</a></span><span class='o'>(</span>expand <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>0.5</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Construct our table</span></span> <span><span class='nv'>table</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/cbind.html'>rbind</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/tapply.html'>tapply</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Temp</span>, <span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Month</span>, <span class='nv'>max</span><span class='o'>)</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/tapply.html'>tapply</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Temp</span>, <span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Month</span>, <span class='nv'>median</span><span class='o'>)</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/tapply.html'>tapply</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Temp</span>, <span class='nv'>airquality</span><span class='o'>$</span><span class='nv'>Month</span>, <span class='nv'>min</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>table</span><span class='o'>)</span> <span class='o'>&lt;-</span> <span class='nv'>month.name</span><span class='o'>[</span><span class='m'>5</span><span class='o'>:</span><span class='m'>9</span><span class='o'>]</span></span> <span><span class='nv'>table</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span></span> <span> Measure <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Max"</span>, <span class='s'>"Median"</span>, <span class='s'>"Min"</span><span class='o'>)</span>,</span> <span> <span class='nv'>table</span></span> <span><span class='o'>)</span></span> <span><span class='nv'>table</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://gt.rstudio.com/reference/gt.html'>gt</a></span><span class='o'>(</span><span class='nv'>table</span>, rowname_col <span class='o'>=</span> <span class='s'>"Measure"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gt.rstudio.com/reference/cols_width.html'>cols_width</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>contains</a></span><span class='o'>(</span><span class='nv'>month.name</span><span class='o'>)</span> <span class='o'>~</span> <span class='nf'><a href='https://gt.rstudio.com/reference/px.html'>px</a></span><span class='o'>(</span><span class='m'>100</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gt.rstudio.com/reference/cols_align.html'>cols_align</a></span><span class='o'>(</span>align <span class='o'>=</span> <span class='s'>"center"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gt.rstudio.com/reference/cols_align.html'>cols_align</a></span><span class='o'>(</span>align <span class='o'>=</span> <span class='s'>"right"</span>, columns <span class='o'>=</span> <span class='s'>"Measure"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>p2</span> <span class='o'>/</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>table</span>, space <span class='o'>=</span> <span class='s'>"fixed"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Circling back, there was another argument to <a href="https://patchwork.data-imaginist.com/reference/wrap_table.html" target="_blank" rel="noopener"><code>wrap_table()</code></a> we didn&rsquo;t get into yet. In the plot above, we see that the row names are conveniently aligned with the axis rather than the panel of the plot above, in the same way as the headers where placed outside the panel area. This is a nice default and generally makes sense for the semantics of a table, but you might want something different. The <code>panel</code> argument allows you to control this exact behavior. It takes <code>&quot;body&quot;</code>, <code>&quot;full&quot;</code>, <code>&quot;rows&quot;</code>, or <code>&quot;cols&quot;</code> which indicate what portion of the table should be inside the panel area. The default is <code>&quot;body&quot;</code> which places row and column names outside the panel. <code>&quot;full&quot;</code>, on the contrary, places everything inside, while <code>&quot;rows&quot;</code> and <code>&quot;cols&quot;</code> are half versions that allows you to keep either column <em>or</em> row names outside the panel respectively.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Place all rows (including the header row) inside the panel area</span></span> <span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>aq</span>, panel <span class='o'>=</span> <span class='s'>"rows"</span>, space <span class='o'>=</span> <span class='s'>"free_y"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Just like the tables support ggplot2-like titles, they also support tags, meaning that patchworks auto-tagging works as expected. It can be turned off using the <code>ignore_tag</code> argument but often you&rsquo;d want to treat it as a figure in the figure text:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/wrap_table.html'>wrap_table</a></span><span class='o'>(</span><span class='nv'>aq</span>, panel <span class='o'>=</span> <span class='s'>"rows"</span>, space <span class='o'>=</span> <span class='s'>"free_y"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/plot_annotation.html'>plot_annotation</a></span><span class='o'>(</span>tag_levels <span class='o'>=</span> <span class='s'>"A"</span><span class='o'>)</span> <span class='o'>&amp;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>plot.tag <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>6</span>, <span class='m'>6</span>, <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="accesibility">Accesibility <a href="#accesibility"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We truly believe that the features laid out above will be a boon for augmenting your data visualisation with data that can be read precisely at a glance. However, we would be remiss to not note how tables that are part of a patchwork visualisation doesn&rsquo;t have the same accessibility featurees as a gt table included directly in e.g. an HTML output. This is because graphics are rasterised into a PNG file and thus looses all semantical information that is inherent in a table. This should be kept in mind when providing Alt text for your figures so you ensure they are legible for everyone.</p> <h3 id="future">Future <a href="#future"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The support on the patchwork end is likely done at this point, but the conversion to grobs that has been added to gt is still somewhat young and will improve over time. It is likely that markdown formatting (through marquee) and other niceties will get added, leading to even more power in composing tables with plots using patchwork as the glue between them. As with the <a href="https://quarto.org/docs/blog/posts/2024-07-02-beautiful-tables-in-typst/" target="_blank" rel="noopener">support for gt in typst</a> the support for gt in patchwork is part of our larger effort to bring the power of gt to more environments and create a single unified solution to table styling.</p> <h2 id="with-freedom-comes-great-responsibility">With freedom comes great responsibility <a href="#with-freedom-comes-great-responsibility"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The second leg of this release concerns the <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> function which was introduced in the last release. I devoted a whole section of my posit::conf talk this year to talk about <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> and how it was a good thing to say no to requests for functionality until you have a solution that fits into your API and doesn&rsquo;t add clutter. I really like how the API for <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> turned out but I also knew it could do more. In this release it delivers on those promises with two additional arguments.</p> <h3 id="which-side">Which side? <a href="#which-side"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As it were, <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> could only be used to completely turn off alignment of a plot, e.g. like below:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>gear</span><span class='o'>)</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>gear</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_discrete.html'>scale_y_discrete</a></span><span class='o'>(</span></span> <span> <span class='s'>""</span>,</span> <span> labels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"3 gears are often enough"</span>,</span> <span> <span class='s'>"But, you know, 4 is a nice number"</span>,</span> <span> <span class='s'>"I would def go with 5 gears in a modern car"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='nv'>p2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p1</span><span class='o'>)</span> <span class='o'>/</span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>We can see that panel alignment has been turned off both to the left and to the right (and top and bottom if it were visible). But perhaps you are only interested in un-aligning the left side, keeping the legend to the right of both plots. Now you can, thanks to the <code>side</code> argument which takes a string containing one or more of the <code>t</code>, <code>r</code>, <code>b</code>, and <code>l</code> characters to indicate which sides to apply the freeing to (default is <code>&quot;trbl&quot;</code> meaning &ldquo;target all sides&rdquo;).</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p1</span>, side <span class='o'>=</span> <span class='s'>"l"</span><span class='o'>)</span> <span class='o'>/</span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Freeing works inside nested patchworks, where you can target various sides at various levels:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>gear</span><span class='o'>)</span>, <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_discrete.html'>scale_y_discrete</a></span><span class='o'>(</span></span> <span> <span class='s'>""</span>,</span> <span> labels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"... and 3"</span>,</span> <span> <span class='s'>"4 of them"</span>,</span> <span> <span class='s'>"5 gears"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span></span> <span><span class='nv'>nested</span> <span class='o'>&lt;-</span> <span class='nv'>p2</span> <span class='o'>/</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p1</span>, side <span class='o'>=</span> <span class='s'>"l"</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>nested</span>, side <span class='o'>=</span> <span class='s'>"r"</span><span class='o'>)</span> <span class='o'>/</span></span> <span> <span class='nv'>p3</span></span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="what-does-freeing-means-anyway">What does &ldquo;freeing&rdquo; means anyway? <a href="#what-does-freeing-means-anyway"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>While being able to target specific sides is pretty great in and off itself, we are not done yet. After being able to <em>not</em> align panels the most requested feature was the possibility of moving the axis title closer to the axis text if alignment had pushed it apart. Consider again our unfreed patchwork:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>/</span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-12-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>While we can &ldquo;fix&rdquo; it by letting the top panel stretch, another way to improve upon it would be to move the dangling y-axis title of the bottom plot closer to the axis. Enter the <code>type</code> argument to <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> which informs patchwork how to not align the input. The default (<code>&quot;panel&quot;</code>) works just as <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> always has, but the other two values opens up some new nifty goodies. Setting <code>type = &quot;label&quot;</code> does exactly what we discussed above, freeing the label from alignment so it sticks together with the axis and axis text:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>/</span></span> <span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p2</span>, type <span class='o'>=</span> <span class='s'>"label"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-13-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The other type is <code>&quot;space&quot;</code> which works slightly different. Using this you tell patchwork to not reserve any space for what the side(s) contain. This is perfect in situation where you already have empty space next to it that can fit the content. Consider this plot:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/plot_spacer.html'>plot_spacer</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nv'>p1</span> <span class='o'>+</span></span> <span> <span class='nv'>p2</span> <span class='o'>+</span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-14-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Ugh, the axis text of the top plot pushes everything apart even though there is ample of space for it in the empty region on the left. This is where <code>type = &quot;space&quot;</code> comes in handy:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/plot_spacer.html'>plot_spacer</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p1</span>, type <span class='o'>=</span> <span class='s'>"space"</span>, side <span class='o'>=</span> <span class='s'>"l"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nv'>p2</span> <span class='o'>+</span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-15-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Of course, such power comes with the responsibility of you ensuring there is actually space for it &mdash; otherwise it will escape out of the figure area:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p1</span>, type <span class='o'>=</span> <span class='s'>"space"</span>, side <span class='o'>=</span> <span class='s'>"l"</span><span class='o'>)</span> <span class='o'>/</span></span> <span> <span class='nv'>p2</span></span> </code></pre> <p><img src="figs/unnamed-chunk-16-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>All the different types of freeing can be stacked on top of each other so you can have a plot that keeps the left axis label together with the axis while also stretches the right side to take up empty space:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p1</span> <span class='o'>/</span></span> <span> <span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nf'><a href='https://patchwork.data-imaginist.com/reference/free.html'>free</a></span><span class='o'>(</span><span class='nv'>p2</span>, <span class='s'>"panel"</span>, <span class='s'>"r"</span><span class='o'>)</span>, <span class='s'>"label"</span>, <span class='s'>"l"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-17-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>But as always, don&rsquo;t go overboard. If you find yourself needing to use an elaborate combination of stacked <a href="https://patchwork.data-imaginist.com/reference/free.html" target="_blank" rel="noopener"><code>free()</code></a> calls there is a good chance that something with your core composition needs rethinking.</p> <h2 id="the-rest">The rest <a href="#the-rest"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The above are the clear highlights of this release. It also contains the standard bug fixes &mdash; especially in the area of axis collecting which was introduced in the last release and came with a bunch of edge cases that were unaccounted for. There is also a new utility function: <a href="https://rdrr.io/r/base/merge.html" target="_blank" rel="noopener"><code>merge()</code></a> which is an alternative to the <code>-</code> operator that I don&rsquo;t think many users understood or used. It allows you to merge all plots together into a nested patchwork so that the right hand side is added to a new composition.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you to all people who have contributed issues, code and comments to this release:</p> <p> <a href="https://github.com/BenVolpe94" target="_blank" rel="noopener">@BenVolpe94</a>, <a href="https://github.com/daniellembecker" target="_blank" rel="noopener">@daniellembecker</a>, <a href="https://github.com/dchiu911" target="_blank" rel="noopener">@dchiu911</a>, <a href="https://github.com/ericKuo722" target="_blank" rel="noopener">@ericKuo722</a>, <a href="https://github.com/Fan-iX" target="_blank" rel="noopener">@Fan-iX</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jack-davison" target="_blank" rel="noopener">@jack-davison</a>, <a href="https://github.com/karchern" target="_blank" rel="noopener">@karchern</a>, <a href="https://github.com/laresbernardo" target="_blank" rel="noopener">@laresbernardo</a>, <a href="https://github.com/marchtaylor" target="_blank" rel="noopener">@marchtaylor</a>, <a href="https://github.com/mariadelmarq" target="_blank" rel="noopener">@mariadelmarq</a>, <a href="https://github.com/Maschette" target="_blank" rel="noopener">@Maschette</a>, <a href="https://github.com/michaeltopper1" target="_blank" rel="noopener">@michaeltopper1</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/n-kall" target="_blank" rel="noopener">@n-kall</a>, <a href="https://github.com/person-c" target="_blank" rel="noopener">@person-c</a>, <a href="https://github.com/pettyalex" target="_blank" rel="noopener">@pettyalex</a>, <a href="https://github.com/petzi53" target="_blank" rel="noopener">@petzi53</a>, <a href="https://github.com/phispu" target="_blank" rel="noopener">@phispu</a>, <a href="https://github.com/psychelzh" target="_blank" rel="noopener">@psychelzh</a>, <a href="https://github.com/rinivarg" target="_blank" rel="noopener">@rinivarg</a>, <a href="https://github.com/selkamand" target="_blank" rel="noopener">@selkamand</a>, <a href="https://github.com/Soham6298" target="_blank" rel="noopener">@Soham6298</a>, <a href="https://github.com/svraka" target="_blank" rel="noopener">@svraka</a>, <a href="https://github.com/teng-gao" target="_blank" rel="noopener">@teng-gao</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/timz0605" target="_blank" rel="noopener">@timz0605</a>, <a href="https://github.com/wish1832" target="_blank" rel="noopener">@wish1832</a>, and <a href="https://github.com/Yunuuuu" target="_blank" rel="noopener">@Yunuuuu</a>.</p> pkgdown 2.1.0 https://www.tidyverse.org/blog/2024/07/pkgdown-2-1-0/ Mon, 08 Jul 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/07/pkgdown-2-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [s] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re delighted to announce the release of <a href="http://pkgdown.r-lib.org/" target="_blank" rel="noopener">pkgdown</a> 2.1.0. pkgdown is designed to make it quick and easy to build a beautiful and accessible website for your package.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"pkgdown"</span><span class='o'>)</span></span></code></pre> </div> <p>This is a massive release with a bunch of new features. I&rsquo;ll highlight the most important here, but as always, I highlight recommend skimming the <a href="https://github.com/r-lib/pkgdown/releases/tag/v2.1.0" target="_blank" rel="noopener">release notes</a> for other smaller improvements and bug fixes.</p> <p>First, and most importantly, please join me in welcoming two new authors to pkgdown: <a href="https://github.com/olivroy" target="_blank" rel="noopener">Olivier Roy</a> and <a href="https://github.com/salim-b" target="_blank" rel="noopener">Salim Brüggemann</a>. They have both contributed many improvements to the package and I&rsquo;m very happy to officially have them aboard as package authors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://pkgdown.r-lib.org/'>pkgdown</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="lifecycle-changes">Lifecycle changes <a href="#lifecycle-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Let&rsquo;s get started with the important stuff, the <a href="https://www.tidyverse.org/blog/2021/02/lifecycle-1-0-0/" target="_blank" rel="noopener">lifecycle updates</a>. Most important we&rsquo;ve decided to deprecate support for Bootstrap 3, which was superseded in December 2021. We&rsquo;re starting to more directly encourage folks to move away from it as maintaining two separate sets of site templates is a time sink. If you&rsquo;re still using BS3, now&rsquo;s the <a href="https://www.tidyverse.org/blog/2021/12/pkgdown-2-0-0/#bootstrap-5" target="_blank" rel="noopener">time to upgrade</a>.</p> <p>There are three other changes that are less likely to affect folks:</p> <ul> <li> <p>The <code>document</code> argument to <a href="https://pkgdown.r-lib.org/reference/build_site.html" target="_blank" rel="noopener"><code>build_site()</code></a> and <a href="https://pkgdown.r-lib.org/reference/build_reference.html" target="_blank" rel="noopener"><code>build_reference()</code></a> has been removed after being deprecated in pkgdown 1.4.0; use the <a href="https://pkgdown.r-lib.org/reference/build_site.html#arg-devel" target="_blank" rel="noopener"><code>devel</code> argument</a> instead.</p> </li> <li> <p> <a href="https://pkgdown.r-lib.org/reference/autolink_html.html" target="_blank" rel="noopener"><code>autolink_html()</code></a> was deprecated in pkgdown 1.6.0 and now warns every time you use it; use <a href="https://downlit.r-lib.org/reference/downlit_html_path.html" target="_blank" rel="noopener"><code>downlit::downlit_html_path()</code></a> instead.</p> </li> <li> <p> <a href="https://pkgdown.r-lib.org/reference/preview_page.html" target="_blank" rel="noopener"><code>preview_page()</code></a> has been deprecated; use <a href="https://pkgdown.r-lib.org/reference/preview_site.html" target="_blank" rel="noopener"><code>preview_site()</code></a> instead.</p> </li> </ul> <h2 id="major-new-features">Major new features <a href="#major-new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>pkgdown 2.1.0 has two major new features: support for Quarto vignettes and a new light switch that toggles between light and dark modes.</p> <h3 id="quarto-support">Quarto support <a href="#quarto-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://pkgdown.r-lib.org/reference/build_articles.html" target="_blank" rel="noopener"><code>build_article()</code></a>/ <a href="https://pkgdown.r-lib.org/reference/build_articles.html" target="_blank" rel="noopener"><code>build_articles()</code></a> now support articles and vignettes written with Quarto. To use it, make sure you have the the latest version of Quarto, 1.5, which was released last week. By and large you should be able to just write in Quarto and things will just work, but you will need to make a small change to your GitHub action. Learn more at <a href="https://pkgdown.r-lib.org/articles/quarto.html" target="_blank" rel="noopener"><code>vignette(&quot;quarto&quot;)</code></a>.</p> <p>Combining the individual quarto and pkgdown templating systems is a delicate art, so while I&rsquo;ve done my best to make it work, there may be some rough edges. Check out the current known limitations in <a href="https://pkgdown.r-lib.org/articles/quarto.html" target="_blank" rel="noopener"><code>vignette(&quot;quarto&quot;)</code></a>, and please file an issue if you encounter a quarto feature that doesn&rsquo;t work quite right.</p> <h3 id="light-switch">Light switch <a href="#light-switch"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>pkgdown sites can now provide a &ldquo;light switch&rdquo; that allows the reader to switch between light and dark modes (based on work in bslib by <a href="https://github.com/gadenbuie" target="_blank" rel="noopener">@gadenbuie</a>). You can try it out on <a href="https://pkgdown.r-lib.org">https://pkgdown.r-lib.org</a>: the light switch appears at the far right at the navbar and remembers the users choice between visits to your site.</p> <p>(Note that the light switch works differently to quarto dark mode. In quarto, you can provide two completely different themes for light and dark mode. In pkgdown, dark mode is a relatively thin overlay that based on your light theme colours.)</p> <p>For now, you&rsquo;ll need to opt-in to the light-switch by adding the following to your <code>_pkgdown.yml</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-yaml" data-lang="yaml">template<span class="w"> </span><span class="w"> </span><span class="k">light-switch</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></code></pre></div><p>In the future we hope to turn it on automatically.</p> <p>You can learn more about customising the light switch in <a href="https://pkgdown.r-lib.org/articles/customise.html" target="_blank" rel="noopener"><code>vignette(&quot;customise&quot;)</code></a>: you can choose to select your own syntax highlighting scheme for dark mode, override dark-specific BS lib variables, and move its location in the navbar.</p> <h2 id="user-experience">User experience <a href="#user-experience"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve made a bunch of small changes to enhance the user experience of pkgdown sites:</p> <ul> <li> <p>We&rsquo;ve continued in our efforts to make pkgdown sites as accessible as possible by now warning if you&rsquo;ve forgotten to add alt text to images (including plots) in your articles. We&rsquo;ve also added a new <a href="https://pkgdown.r-lib.org/articles/accessibility.html" target="_blank" rel="noopener"><code>vignette(&quot;accessibility&quot;)</code></a> which describes additional manual tasks you can perform to make your site as accessible as possible.</p> </li> <li> <p> <a href="https://pkgdown.r-lib.org/reference/build_reference.html" target="_blank" rel="noopener"><code>build_reference()</code></a> adds anchors to arguments making it possible to link directly to an argument. This is very useful when you&rsquo;re trying to direct folks to the documentation for a specific argument, e.g. <a href="https://pkgdown.r-lib.org/reference/build_site.html#arg-devel">https://pkgdown.r-lib.org/reference/build_site.html#arg-devel</a>.</p> </li> <li> <p> <a href="https://pkgdown.r-lib.org/reference/build_reference.html" target="_blank" rel="noopener"><code>build_reference_index()</code></a> now displays function lifecycle badges <a href="https://pkgdown.r-lib.org/reference/index.html#deprecated-functions" target="_blank" rel="noopener">next to the function name</a>. If you want to gather together (e.g.) all the deprecated function in one spot in the reference index, you can use the new topic selector <code>has_lifecycle(&quot;deprecated&quot;)</code>.</p> </li> <li> <p>The new <code>template.math-rendering</code> option allows you to control how math is rendered on your site. The default uses <code>mathml</code> which is zero dependency but has the lowest fidelity. If you use a lot of math on your site, you can switch back to the previous method with <code>mathjax</code>, or try out <code>katex</code>, a faster alternative.</p> </li> <li> <p>pkgdown sites no longer depend on external content distribution networks (CDN) for common javascript, CSS, and font files. CDNs no longer provide <a href="https://www.stefanjudis.com/notes/say-goodbye-to-resource-caching-across-sites-and-domains/" target="_blank" rel="noopener">any performance advantages</a> and make deployment harder inside certain locked-down corporate environments.</p> </li> <li> <p>pkgdown includes translations for more terms including &ldquo;Abstract&rdquo; and &ldquo;Search site&rdquo;. A big thanks to @jplecavalier, @dieghernan, @krlmlr, @LDalby, @rich-iannone, @jmaspons, and @mine-cetinkaya-rundel for providing updated translations in French, Spanish, Portugese, Germna, Catalan, and Turkish!</p> <p>I&rsquo;ve also written <a href="https://pkgdown.r-lib.org/articles/translations.html" target="_blank" rel="noopener"><code>vignette(&quot;translations&quot;)</code></a>, a brief vignette that discusses how translation works for non-English sites, and includes how you can create translations for new languages. (This is a great way to contribute to pkgdown if you are multi-lingual!)</p> </li> </ul> <h3 id="developer-experience">Developer experience <a href="#developer-experience"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve also made a bunch of minor improvements to make improve the package developer experience:</p> <ul> <li> <p>YAML validation has been substantially improved so you should get much clearer errors if you have made a mistake in your <code>_pkgdown.yml</code>. Please <a href="https://github.com/r-lib/pkgdown/issues/new" target="_blank" rel="noopener">file an issue</a> if you find a case where the error message is not helpful.</p> </li> <li> <p>The <code>build_*()</code> functions (apart from <a href="https://pkgdown.r-lib.org/reference/build_site.html" target="_blank" rel="noopener"><code>build_site()</code></a>) no longer automatically preview in interactive sessions since they all emit clickable links to any files that have changed. You can continue to use <a href="https://pkgdown.r-lib.org/reference/preview_site.html" target="_blank" rel="noopener"><code>preview_site()</code></a> to open the site in your browser.</p> </li> <li> <p>The <code>build_*()</code> functions now work better if you&rsquo;re previewing just part of a site and haven&rsquo;t built the whole thing. It should no longer be necessary to run <a href="https://pkgdown.r-lib.org/reference/init_site.html" target="_blank" rel="noopener"><code>init_site()</code></a> in most cases, and you shouldn&rsquo;t be able to get into a state where you&rsquo;re told to run <a href="https://pkgdown.r-lib.org/reference/init_site.html" target="_blank" rel="noopener"><code>init_site()</code></a> and then it doesn&rsquo;t work.</p> </li> <li> <p>We give more and clearer details of the site building process including reporting on exactly what is generated by bslib, what is copied from templates, and what redirects are generated.</p> </li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 212 folks who contributed to this release! <a href="https://github.com/Adafede" target="_blank" rel="noopener">@Adafede</a>, <a href="https://github.com/AEBilgrau" target="_blank" rel="noopener">@AEBilgrau</a>, <a href="https://github.com/albertocasagrande" target="_blank" rel="noopener">@albertocasagrande</a>, <a href="https://github.com/alex-d13" target="_blank" rel="noopener">@alex-d13</a>, <a href="https://github.com/AliSajid" target="_blank" rel="noopener">@AliSajid</a>, <a href="https://github.com/arkadiuszbeer" target="_blank" rel="noopener">@arkadiuszbeer</a>, <a href="https://github.com/ArneBab" target="_blank" rel="noopener">@ArneBab</a>, <a href="https://github.com/asadow" target="_blank" rel="noopener">@asadow</a>, <a href="https://github.com/ateucher" target="_blank" rel="noopener">@ateucher</a>, <a href="https://github.com/avhz" target="_blank" rel="noopener">@avhz</a>, <a href="https://github.com/banfai" target="_blank" rel="noopener">@banfai</a>, <a href="https://github.com/barcaroli" target="_blank" rel="noopener">@barcaroli</a>, <a href="https://github.com/BartJanvanRossum" target="_blank" rel="noopener">@BartJanvanRossum</a>, <a href="https://github.com/bastistician" target="_blank" rel="noopener">@bastistician</a>, <a href="https://github.com/ben18785" target="_blank" rel="noopener">@ben18785</a>, <a href="https://github.com/bijoychandraAU" target="_blank" rel="noopener">@bijoychandraAU</a>, <a href="https://github.com/Bisaloo" target="_blank" rel="noopener">@Bisaloo</a>, <a href="https://github.com/bkmgit" target="_blank" rel="noopener">@bkmgit</a>, <a href="https://github.com/bnprks" target="_blank" rel="noopener">@bnprks</a>, <a href="https://github.com/brycefrank" target="_blank" rel="noopener">@brycefrank</a>, <a href="https://github.com/bschilder" target="_blank" rel="noopener">@bschilder</a>, <a href="https://github.com/bundfussr" target="_blank" rel="noopener">@bundfussr</a>, <a href="https://github.com/cararthompson" target="_blank" rel="noopener">@cararthompson</a>, <a href="https://github.com/Carol-seven" target="_blank" rel="noopener">@Carol-seven</a>, <a href="https://github.com/cbailiss" target="_blank" rel="noopener">@cbailiss</a>, <a href="https://github.com/cboettig" target="_blank" rel="noopener">@cboettig</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/chlebowa" target="_blank" rel="noopener">@chlebowa</a>, <a href="https://github.com/chuxinyuan" target="_blank" rel="noopener">@chuxinyuan</a>, <a href="https://github.com/cromanpa94" target="_blank" rel="noopener">@cromanpa94</a>, <a href="https://github.com/cthombor" target="_blank" rel="noopener">@cthombor</a>, <a href="https://github.com/d-morrison" target="_blank" rel="noopener">@d-morrison</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/DarioS" target="_blank" rel="noopener">@DarioS</a>, <a href="https://github.com/davidchall" target="_blank" rel="noopener">@davidchall</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dbosak01" target="_blank" rel="noopener">@dbosak01</a>, <a href="https://github.com/dchiu911" target="_blank" rel="noopener">@dchiu911</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/DeepanshKhurana" target="_blank" rel="noopener">@DeepanshKhurana</a>, <a href="https://github.com/dhersz" target="_blank" rel="noopener">@dhersz</a>, <a href="https://github.com/dieghernan" target="_blank" rel="noopener">@dieghernan</a>, <a href="https://github.com/djhocking" target="_blank" rel="noopener">@djhocking</a>, <a href="https://github.com/dkarletsos" target="_blank" rel="noopener">@dkarletsos</a>, <a href="https://github.com/dmurdoch" target="_blank" rel="noopener">@dmurdoch</a>, <a href="https://github.com/dshemetov" target="_blank" rel="noopener">@dshemetov</a>, <a href="https://github.com/dsweber2" target="_blank" rel="noopener">@dsweber2</a>, <a href="https://github.com/dvg-p4" target="_blank" rel="noopener">@dvg-p4</a>, <a href="https://github.com/DyfanJones" target="_blank" rel="noopener">@DyfanJones</a>, <a href="https://github.com/ecmerkle" target="_blank" rel="noopener">@ecmerkle</a>, <a href="https://github.com/eddelbuettel" target="_blank" rel="noopener">@eddelbuettel</a>, <a href="https://github.com/eeholmes" target="_blank" rel="noopener">@eeholmes</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/eliocamp" target="_blank" rel="noopener">@eliocamp</a>, <a href="https://github.com/elong0527" target="_blank" rel="noopener">@elong0527</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/erikarasnick" target="_blank" rel="noopener">@erikarasnick</a>, <a href="https://github.com/esimms999" target="_blank" rel="noopener">@esimms999</a>, <a href="https://github.com/espinielli" target="_blank" rel="noopener">@espinielli</a>, <a href="https://github.com/etiennebacher" target="_blank" rel="noopener">@etiennebacher</a>, <a href="https://github.com/ewenharrison" target="_blank" rel="noopener">@ewenharrison</a>, <a href="https://github.com/filipsch" target="_blank" rel="noopener">@filipsch</a>, <a href="https://github.com/FlukeAndFeather" target="_blank" rel="noopener">@FlukeAndFeather</a>, <a href="https://github.com/francoisluc" target="_blank" rel="noopener">@francoisluc</a>, <a href="https://github.com/friendly" target="_blank" rel="noopener">@friendly</a>, <a href="https://github.com/fweber144" target="_blank" rel="noopener">@fweber144</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gadenbuie" target="_blank" rel="noopener">@gadenbuie</a>, <a href="https://github.com/galachad" target="_blank" rel="noopener">@galachad</a>, <a href="https://github.com/gangstR" target="_blank" rel="noopener">@gangstR</a>, <a href="https://github.com/gavinsimpson" target="_blank" rel="noopener">@gavinsimpson</a>, <a href="https://github.com/GeoBosh" target="_blank" rel="noopener">@GeoBosh</a>, <a href="https://github.com/GFabien" target="_blank" rel="noopener">@GFabien</a>, <a href="https://github.com/ggcostoya" target="_blank" rel="noopener">@ggcostoya</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/givison" target="_blank" rel="noopener">@givison</a>, <a href="https://github.com/gladkia" target="_blank" rel="noopener">@gladkia</a>, <a href="https://github.com/glin" target="_blank" rel="noopener">@glin</a>, <a href="https://github.com/gmbecker" target="_blank" rel="noopener">@gmbecker</a>, <a href="https://github.com/gravesti" target="_blank" rel="noopener">@gravesti</a>, <a href="https://github.com/GregorDeCillia" target="_blank" rel="noopener">@GregorDeCillia</a>, <a href="https://github.com/gregorypenn" target="_blank" rel="noopener">@gregorypenn</a>, <a href="https://github.com/gsmolinski" target="_blank" rel="noopener">@gsmolinski</a>, <a href="https://github.com/gsrohde" target="_blank" rel="noopener">@gsrohde</a>, <a href="https://github.com/gungorMetehan" target="_blank" rel="noopener">@gungorMetehan</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/harshkrishna17" target="_blank" rel="noopener">@harshkrishna17</a>, <a href="https://github.com/HenrikBengtsson" target="_blank" rel="noopener">@HenrikBengtsson</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/hrecht" target="_blank" rel="noopener">@hrecht</a>, <a href="https://github.com/hsloot" target="_blank" rel="noopener">@hsloot</a>, <a href="https://github.com/idavydov" target="_blank" rel="noopener">@idavydov</a>, <a href="https://github.com/idmn" target="_blank" rel="noopener">@idmn</a>, <a href="https://github.com/igordot" target="_blank" rel="noopener">@igordot</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jabenninghoff" target="_blank" rel="noopener">@jabenninghoff</a>, <a href="https://github.com/jack-davison" target="_blank" rel="noopener">@jack-davison</a>, <a href="https://github.com/jangorecki" target="_blank" rel="noopener">@jangorecki</a>, <a href="https://github.com/jayhesselberth" target="_blank" rel="noopener">@jayhesselberth</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/JerryWho" target="_blank" rel="noopener">@JerryWho</a>, <a href="https://github.com/jhelvy" target="_blank" rel="noopener">@jhelvy</a>, <a href="https://github.com/jmaspons" target="_blank" rel="noopener">@jmaspons</a>, <a href="https://github.com/john-harrold" target="_blank" rel="noopener">@john-harrold</a>, <a href="https://github.com/john-ioannides" target="_blank" rel="noopener">@john-ioannides</a>, <a href="https://github.com/jonasmuench" target="_blank" rel="noopener">@jonasmuench</a>, <a href="https://github.com/jonnybaik" target="_blank" rel="noopener">@jonnybaik</a>, <a href="https://github.com/josherrickson" target="_blank" rel="noopener">@josherrickson</a>, <a href="https://github.com/joshualerickson" target="_blank" rel="noopener">@joshualerickson</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/jplecavalier" target="_blank" rel="noopener">@jplecavalier</a>, <a href="https://github.com/JSchoenbachler" target="_blank" rel="noopener">@JSchoenbachler</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/jwimberl" target="_blank" rel="noopener">@jwimberl</a>, <a href="https://github.com/kalaschnik" target="_blank" rel="noopener">@kalaschnik</a>, <a href="https://github.com/kevinushey" target="_blank" rel="noopener">@kevinushey</a>, <a href="https://github.com/klmr" target="_blank" rel="noopener">@klmr</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/LDalby" target="_blank" rel="noopener">@LDalby</a>, <a href="https://github.com/ldecicco-USGS" target="_blank" rel="noopener">@ldecicco-USGS</a>, <a href="https://github.com/lhdjung" target="_blank" rel="noopener">@lhdjung</a>, <a href="https://github.com/LiNk-NY" target="_blank" rel="noopener">@LiNk-NY</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/Liripo" target="_blank" rel="noopener">@Liripo</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, <a href="https://github.com/mabesa" target="_blank" rel="noopener">@mabesa</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/maRce10" target="_blank" rel="noopener">@maRce10</a>, <a href="https://github.com/margotbligh" target="_blank" rel="noopener">@margotbligh</a>, <a href="https://github.com/marine-ecologist" target="_blank" rel="noopener">@marine-ecologist</a>, <a href="https://github.com/markfairbanks" target="_blank" rel="noopener">@markfairbanks</a>, <a href="https://github.com/martinlaw" target="_blank" rel="noopener">@martinlaw</a>, <a href="https://github.com/matt-dray" target="_blank" rel="noopener">@matt-dray</a>, <a href="https://github.com/mattfidler" target="_blank" rel="noopener">@mattfidler</a>, <a href="https://github.com/matthewjnield" target="_blank" rel="noopener">@matthewjnield</a>, <a href="https://github.com/MattPM" target="_blank" rel="noopener">@MattPM</a>, <a href="https://github.com/mccarthy-m-g" target="_blank" rel="noopener">@mccarthy-m-g</a>, <a href="https://github.com/MEO265" target="_blank" rel="noopener">@MEO265</a>, <a href="https://github.com/merliseclyde" target="_blank" rel="noopener">@merliseclyde</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mikeblazanin" target="_blank" rel="noopener">@mikeblazanin</a>, <a href="https://github.com/mikeroswell" target="_blank" rel="noopener">@mikeroswell</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/MLopez-Ibanez" target="_blank" rel="noopener">@MLopez-Ibanez</a>, <a href="https://github.com/Moohan" target="_blank" rel="noopener">@Moohan</a>, <a href="https://github.com/mpadge" target="_blank" rel="noopener">@mpadge</a>, <a href="https://github.com/mrcaseb" target="_blank" rel="noopener">@mrcaseb</a>, <a href="https://github.com/mrchypark" target="_blank" rel="noopener">@mrchypark</a>, <a href="https://github.com/ms609" target="_blank" rel="noopener">@ms609</a>, <a href="https://github.com/msberends" target="_blank" rel="noopener">@msberends</a>, <a href="https://github.com/musvaage" target="_blank" rel="noopener">@musvaage</a>, <a href="https://github.com/nanxstats" target="_blank" rel="noopener">@nanxstats</a>, <a href="https://github.com/nathaneastwood" target="_blank" rel="noopener">@nathaneastwood</a>, <a href="https://github.com/netique" target="_blank" rel="noopener">@netique</a>, <a href="https://github.com/nicholascarey" target="_blank" rel="noopener">@nicholascarey</a>, <a href="https://github.com/nicolerg" target="_blank" rel="noopener">@nicolerg</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/pearsonca" target="_blank" rel="noopener">@pearsonca</a>, <a href="https://github.com/peterdesmet" target="_blank" rel="noopener">@peterdesmet</a>, <a href="https://github.com/phauchamps" target="_blank" rel="noopener">@phauchamps</a>, <a href="https://github.com/przmv" target="_blank" rel="noopener">@przmv</a>, <a href="https://github.com/quantsch" target="_blank" rel="noopener">@quantsch</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/rcannood" target="_blank" rel="noopener">@rcannood</a>, <a href="https://github.com/rempsyc" target="_blank" rel="noopener">@rempsyc</a>, <a href="https://github.com/rgaiacs" target="_blank" rel="noopener">@rgaiacs</a>, <a href="https://github.com/rich-iannone" target="_blank" rel="noopener">@rich-iannone</a>, <a href="https://github.com/rickhelmus" target="_blank" rel="noopener">@rickhelmus</a>, <a href="https://github.com/rmflight" target="_blank" rel="noopener">@rmflight</a>, <a href="https://github.com/robmoss" target="_blank" rel="noopener">@robmoss</a>, <a href="https://github.com/royfrancis" target="_blank" rel="noopener">@royfrancis</a>, <a href="https://github.com/rsangole" target="_blank" rel="noopener">@rsangole</a>, <a href="https://github.com/ryantibs" target="_blank" rel="noopener">@ryantibs</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/samuel-marsh" target="_blank" rel="noopener">@samuel-marsh</a>, <a href="https://github.com/SebKrantz" target="_blank" rel="noopener">@SebKrantz</a>, <a href="https://github.com/SESjo" target="_blank" rel="noopener">@SESjo</a>, <a href="https://github.com/sgvignali" target="_blank" rel="noopener">@sgvignali</a>, <a href="https://github.com/spsanderson" target="_blank" rel="noopener">@spsanderson</a>, <a href="https://github.com/srfall" target="_blank" rel="noopener">@srfall</a>, <a href="https://github.com/stefanoborini" target="_blank" rel="noopener">@stefanoborini</a>, <a href="https://github.com/stephenashton-dhsc" target="_blank" rel="noopener">@stephenashton-dhsc</a>, <a href="https://github.com/strengejacke" target="_blank" rel="noopener">@strengejacke</a>, <a href="https://github.com/swsoyee" target="_blank" rel="noopener">@swsoyee</a>, <a href="https://github.com/t-kalinowski" target="_blank" rel="noopener">@t-kalinowski</a>, <a href="https://github.com/talgalili" target="_blank" rel="noopener">@talgalili</a>, <a href="https://github.com/tanho63" target="_blank" rel="noopener">@tanho63</a>, <a href="https://github.com/tedmoorman" target="_blank" rel="noopener">@tedmoorman</a>, <a href="https://github.com/telphick" target="_blank" rel="noopener">@telphick</a>, <a href="https://github.com/TFKentUSDA" target="_blank" rel="noopener">@TFKentUSDA</a>, <a href="https://github.com/ThierryO" target="_blank" rel="noopener">@ThierryO</a>, <a href="https://github.com/thisisnic" target="_blank" rel="noopener">@thisisnic</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/tomsing1" target="_blank" rel="noopener">@tomsing1</a>, <a href="https://github.com/tony-aw" target="_blank" rel="noopener">@tony-aw</a>, <a href="https://github.com/trevorld" target="_blank" rel="noopener">@trevorld</a>, <a href="https://github.com/tylerlittlefield" target="_blank" rel="noopener">@tylerlittlefield</a>, <a href="https://github.com/uriahf" target="_blank" rel="noopener">@uriahf</a>, <a href="https://github.com/urswilke" target="_blank" rel="noopener">@urswilke</a>, <a href="https://github.com/ValValetl" target="_blank" rel="noopener">@ValValetl</a>, <a href="https://github.com/venpopov" target="_blank" rel="noopener">@venpopov</a>, <a href="https://github.com/vincentvanhees" target="_blank" rel="noopener">@vincentvanhees</a>, <a href="https://github.com/wangq13" target="_blank" rel="noopener">@wangq13</a>, <a href="https://github.com/willgearty" target="_blank" rel="noopener">@willgearty</a>, <a href="https://github.com/wviechtb" target="_blank" rel="noopener">@wviechtb</a>, <a href="https://github.com/xuyiqing" target="_blank" rel="noopener">@xuyiqing</a>, <a href="https://github.com/yjunechoe" target="_blank" rel="noopener">@yjunechoe</a>, <a href="https://github.com/ynsec37" target="_blank" rel="noopener">@ynsec37</a>, <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>, and <a href="https://github.com/zkamvar" target="_blank" rel="noopener">@zkamvar</a>.</p> recipes 1.1.0 https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/ Mon, 08 Jul 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes</a> 1.1.0. recipes lets you create a pipeable sequence of feature engineering steps.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"recipes"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will go over some of the bigger changes in this release. Improvements in column type checking, allowing more data types to be passed to recipes, use of long formulas and better error for misspelled argument names.</p> <p>You can see a full list of changes in the <a href="https://github.com/tidymodels/recipes/releases/tag/v1.1.0" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="column-type-checking">Column type checking <a href="#column-type-checking"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A <a href="https://github.com/tidymodels/recipes/issues/793" target="_blank" rel="noopener">longtime issue</a> in recipes came from the fact that recipes didn&rsquo;t keep a <a href="https://vctrs.r-lib.org/articles/type-size.html" target="_blank" rel="noopener">prototype</a> (ptype) of the data it was specified with. This would cause unexpected things to happen or uninformative error messages to appear if different data was used to <a href="https://recipes.tidymodels.org/reference/prep.html" target="_blank" rel="noopener"><code>prep()</code></a> than was used to create the <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>.</p> <p>Every recipe you create starts with a call to <a href="https://recipes.tidymodels.org/reference/recipe.html" target="_blank" rel="noopener"><code>recipe()</code></a>. In the below example, we create a recipe where <code>x2</code> starts by being a character vector, but the recipe is prepped where <code>x2</code> is a numeric vector. This didn&rsquo;t produce any warnings or errors, silently doing something unintended.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">data_template</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span> <span class="n">outcome</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">10</span><span class="p">),</span> <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">10</span><span class="p">),</span> <span class="n">x2</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="kc">letters</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="bp">T</span><span class="p">)</span> <span class="p">)</span> <span class="n">rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">outcome</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data_template</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_bin2factor</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span> <span class="n">data_training</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span><span class="n">outcome</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="n">x2</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">))</span> <span class="nf">prep</span><span class="p">(</span><span class="n">rec</span><span class="p">,</span> <span class="n">training</span> <span class="o">=</span> <span class="n">data_training</span><span class="p">)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ── Recipe ──────────────────────────────────────────────────────────────────────</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ── Inputs</span> <span class="c1">#&gt; Number of variables by role</span> <span class="c1">#&gt; outcome: 1</span> <span class="c1">#&gt; predictor: 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ── Training information</span> <span class="c1">#&gt; Training data contained 1000 data points and no incomplete rows.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ── Operations</span> <span class="c1">#&gt; • Dummy variable to factor conversion for: x1 | Trained</span> </code></pre></div><p>Now, we get an error detailing how the data is different.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_template</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>outcome <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span>, x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='m'>10</span>, <span class='kc'>T</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>outcome</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>data_template</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_bin2factor.html'>step_bin2factor</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>data_training</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>outcome <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span>, x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='nv'>rec</span>, training <span class='o'>=</span> <span class='nv'>data_training</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `prep()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> The following variable has the wrong class:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `x2` must have class <span style='color: #0000BB;'>&lt;numeric&gt;</span>, not <span style='color: #0000BB;'>&lt;character&gt;</span>.</span></span> <span></span></code></pre> </div> <p>Note that recipes created before version 1.1.0 don&rsquo;t contain any ptype information, and will not undergo checking. Rerunning the code to create the recipe will add ptype information to the recipe.</p> <h2 id="input-checking-in-recipe">Input checking in <code>recipe()</code> <a href="#input-checking-in-recipe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We have relaxed the requirements of data frames, while making feedback more helpful when something goes wrong.</p> <p>The data was previously passed through <a href="https://rdrr.io/r/stats/model.frame.html" target="_blank" rel="noopener"><code>model.frame()</code></a> inside the recipe, which restricted what could be handled. Previously prohibited input included data frames with list-columns or <a href="https://r-spatial.github.io/sf/" target="_blank" rel="noopener">sf</a> data frames. Both of these are now supported, as long as they are a <code>data.frame</code> object.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_listcolumn</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> y <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, <span class='m'>4</span><span class='o'>:</span><span class='m'>6</span>, <span class='m'>3</span><span class='o'>:</span><span class='m'>1</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>10</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>y</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_listcolumn</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; ── Inputs</span></span> <span></span><span><span class='c'>#&gt; Number of variables by role</span></span> <span></span><span><span class='c'>#&gt; outcome: 1</span></span> <span><span class='c'>#&gt; predictor: 1</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://r-spatial.github.io/sf/'>sf</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE</span></span> <span></span><span><span class='nv'>pathshp</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"shape/nc.shp"</span>, package <span class='o'>=</span> <span class='s'>"sf"</span><span class='o'>)</span></span> <span><span class='nv'>data_sf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_read.html'>st_read</a></span><span class='o'>(</span><span class='nv'>pathshp</span>, quiet <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>AREA</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_sf</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; ── Inputs</span></span> <span></span><span><span class='c'>#&gt; Number of variables by role</span></span> <span></span><span><span class='c'>#&gt; outcome: 1</span></span> <span><span class='c'>#&gt; predictor: 14</span></span> <span></span></code></pre> </div> <p>We are excited to see what people can do with these new options.</p> <p>Another way to tell a recipe what variables should be included and what roles they should have is to use <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>add_role()</code></a> and <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>update_role()</code></a>. But if you were not careful, you could end up in situations where the same variable is labeled as both the outcome and predictor.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># didn't used to throw a warning</span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>, new_role <span class='o'>=</span> <span class='s'>"predictor"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>add_role</a></span><span class='o'>(</span><span class='s'>"mpg"</span>, new_role <span class='o'>=</span> <span class='s'>"outcome"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `add_role()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `mpg` cannot get <span style='color: #0000BB;'>"outcome"</span> role as it already has role <span style='color: #0000BB;'>"predictor"</span>.</span></span> <span></span></code></pre> </div> <p>This error can be avoided by using <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>update_role()</code></a> instead of <a href="https://recipes.tidymodels.org/reference/roles.html" target="_blank" rel="noopener"><code>add_role()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>, new_role <span class='o'>=</span> <span class='s'>"predictor"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/roles.html'>update_role</a></span><span class='o'>(</span><span class='s'>"mpg"</span>, new_role <span class='o'>=</span> <span class='s'>"outcome"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; ── Inputs</span></span> <span></span><span><span class='c'>#&gt; Number of variables by role</span></span> <span></span><span><span class='c'>#&gt; outcome: 1</span></span> <span><span class='c'>#&gt; predictor: 10</span></span> <span></span></code></pre> </div> <h2 id="long-formulas-in-recipe">Long formulas in <code>recipe()</code> <a href="#long-formulas-in-recipe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Related to the changes we saw above, we now fully support very long formulas without hitting a <code>C stack usage</code> error.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_wide</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/matrix.html'>matrix</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>10000</span>, ncol <span class='o'>=</span> <span class='m'>10000</span><span class='o'>)</span></span> <span><span class='nv'>data_wide</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/as.data.frame.html'>as.data.frame</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>10000</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>long_formula</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/formula.html'>as.formula</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"~ "</span>, <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>data_wide</span><span class='o'>)</span>, collapse <span class='o'>=</span> <span class='s'>" + "</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>long_formula</span>, <span class='nv'>data_wide</span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Recipe</span> <span style='color: #00BBBB;'>──────────────────────────────────────────────────────────────────────</span></span></span> <span></span><span><span class='c'>#&gt; </span></span> <span></span><span><span class='c'>#&gt; ── Inputs</span></span> <span></span><span><span class='c'>#&gt; Number of variables by role</span></span> <span></span><span><span class='c'>#&gt; predictor: 10000</span></span> <span></span></code></pre> </div> <h2 id="better-error-for-misspelled-argument-names">Better error for misspelled argument names <a href="#better-error-for-misspelled-argument-names"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you have used recipes long enough you are very likely to have run into the following error.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">recipe</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">mtcars</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">step_pca</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">(),</span> <span class="n">number</span> <span class="o">=</span> <span class="m">4</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">prep</span><span class="p">()</span> <span class="c1">#&gt; Error in `step_pca()`:</span> <span class="c1">#&gt; Caused by error in `prep()`:</span> <span class="c1">#&gt; ! Can&#39;t rename variables in this context.</span> </code></pre></div><p>The first time you saw it, it didn&rsquo;t make much sense. Hopefully, you figured out that <a href="https://recipes.tidymodels.org/reference/step_pca.html" target="_blank" rel="noopener">step_pca()</a> doesn&rsquo;t have a <code>number</code> argument, and instead uses <code>num_comp</code> to determine the number of principal components to return. This confusion will be a thing of the past as we now include this improved error message.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_pca.html'>step_pca</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, number <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_pca()`:</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `prep()` at recipes/R/recipe.R:479:9:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> The following argument was specified but do not exist: `number`.</span></span> <span></span></code></pre> </div> <h2 id="quality-of-life-increases-in-step_dummy">Quality of life increases in <code>step_dummy()</code> <a href="#quality-of-life-increases-in-step_dummy"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I would imagine that one of the most used steps is <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a>. We have improved the errors and warnings it spits out when things go sideways.</p> <p>If you apply <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a> to a variable that contains a lot of levels, it will produce a lot of columns, and the resulting object may not fit in memory. This can lead to the following error.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">data_id</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span> <span class="n">id</span> <span class="o">=</span> <span class="nf">as.character</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">100000</span><span class="p">),</span> <span class="n">x1</span> <span class="o">=</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">100000</span><span class="p">),</span> <span class="n">x2</span> <span class="o">=</span> <span class="nf">sample</span><span class="p">(</span><span class="kc">letters</span><span class="p">,</span> <span class="m">100000</span><span class="p">,</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="p">)</span> <span class="nf">recipe</span><span class="p">(</span><span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">data_id</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">step_dummy</span><span class="p">(</span><span class="nf">all_nominal_predictors</span><span class="p">())</span> <span class="o">|&gt;</span> <span class="nf">prep</span><span class="p">()</span> <span class="c1">#&gt; Error: vector memory exhausted (limit reached?)</span> </code></pre></div><p>Instead, you now get a more helpful error message.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_id</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>as.character</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>100000</span><span class='o'>)</span>, </span> <span> x1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>100000</span><span class='o'>)</span>, </span> <span> x2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='m'>100000</span>, <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_id</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_dummy()`:</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `id` contains too many levels (100000), which would result in a</span></span> <span><span class='c'>#&gt; data.frame too large to fit in memory.</span></span> <span></span></code></pre> </div> <p>Likewise, you will get helpful errors if <a href="https://recipes.tidymodels.org/reference/step_dummy.html" target="_blank" rel="noopener"><code>step_dummy()</code></a> gets a <code>NA</code> or unseen values.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_train</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>data_unseen</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='s'>"c"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>data_train</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rec_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>data_unseen</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: <span style='color: #BBBB00;'>!</span> There are new levels in `x`: <span style='color: #0000BB;'>"c"</span>.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Consider using step_novel() (`?recipes::step_novel()`) before `step_dummy()`</span></span> <span><span class='c'>#&gt; to handle unseen values.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span> <span><span class='c'>#&gt; x_b</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #BB0000;'>NA</span></span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>data_na</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rec_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>data_na</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: <span style='color: #BBBB00;'>!</span> There are new levels in `x`: <span style='color: #0000BB;'>NA</span>.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Consider using step_unknown() (`?recipes::step_unknown()`) before</span></span> <span><span class='c'>#&gt; `step_dummy()` to handle missing values.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span> <span><span class='c'>#&gt; x_b</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #BB0000;'>NA</span></span></span> <span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thank you to all the people who have contributed to recipes since the release of v1.0.10:</p> <p> <a href="https://github.com/brynhum" target="_blank" rel="noopener">@brynhum</a>, <a href="https://github.com/DemetriPananos" target="_blank" rel="noopener">@DemetriPananos</a>, <a href="https://github.com/diegoperoni" target="_blank" rel="noopener">@diegoperoni</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/JiahuaQu" target="_blank" rel="noopener">@JiahuaQu</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</p> <h2 id="chocolate-chocolate-chip-cookies">Chocolate Chocolate Chip Cookies <a href="#chocolate-chocolate-chip-cookies"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>preheat oven 350°F</p> <ul> <li>1/3c butter</li> <li>1/2 + 1/3c sugar</li> </ul> <p>mix until fluffy</p> <ul> <li>1 tsp vanilla</li> <li>1 egg</li> </ul> <p>mix until combined</p> <ul> <li>1/2c cocoa</li> <li>1/2 tsp baking soda</li> <li>1c flour</li> </ul> <p>mix until combined</p> <ul> <li>3/4c chocolate chips</li> </ul> <p>bake for about 8 mins, depending on size! they will crack on top, but still be soft.</p> bonsai 0.3.0 https://www.tidyverse.org/blog/2024/06/bonsai-0-3-0/ Tue, 25 Jun 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/06/bonsai-0-3-0/ <p>We&rsquo;re brimming with glee to announce the release of <a href="https://bonsai.tidymodels.org" target="_blank" rel="noopener">bonsai</a> 0.3.0. bonsai is a parsnip extension package for tree-based models, and includes support for random forest and gradient-boosted tree frameworks like partykit and LightGBM. This most recent release of the package introduces support for the <code>&quot;aorsf&quot;</code> engine, which implements accelerated oblique random forests (Jaeger et al. 2022, Jaeger et al. 2024).</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"bonsai"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will demonstrate a modeling workflow where the benefits of using oblique random forests shine through.</p> <p>You can see a full list of changes in the <a href="https://bonsai.tidymodels.org/news/index.html#bonsai-030" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://bonsai.tidymodels.org/'>bonsai</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://plsmod.tidymodels.org'>plsmod</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/corrr'>corrr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="the-meats-data">The <code>meats</code> data <a href="#the-meats-data"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The modeldata package, loaded automatically with the tidymodels meta-package, includes several example datasets to demonstrate modeling problems. We&rsquo;ll make use of a dataset called <code>meats</code> in this post. Each row is a measurement of a sample of finely chopped meat.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 215 × 103</span></span></span> <span><span class='c'>#&gt; x_001 x_002 x_003 x_004 x_005 x_006 x_007 x_008 x_009 x_010 x_011 x_012 x_013</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 2.62 2.62 2.62 2.62 2.62 2.62 2.62 2.62 2.63 2.63 2.63 2.63 2.64</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 2.83 2.84 2.84 2.85 2.85 2.86 2.86 2.87 2.87 2.88 2.88 2.89 2.90</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 2.58 2.58 2.59 2.59 2.59 2.59 2.59 2.60 2.60 2.60 2.60 2.61 2.61</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 2.82 2.82 2.83 2.83 2.83 2.83 2.83 2.84 2.84 2.84 2.84 2.85 2.85</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 2.79 2.79 2.79 2.79 2.80 2.80 2.80 2.80 2.81 2.81 2.81 2.82 2.82</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 3.01 3.02 3.02 3.03 3.03 3.04 3.04 3.05 3.06 3.06 3.07 3.08 3.09</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 2.99 2.99 3.00 3.01 3.01 3.02 3.02 3.03 3.04 3.04 3.05 3.06 3.07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2.53 2.53 2.53 2.53 2.53 2.53 2.53 2.53 2.54 2.54 2.54 2.54 2.54</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 3.27 3.28 3.29 3.29 3.30 3.31 3.31 3.32 3.33 3.33 3.34 3.35 3.36</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 3.40 3.41 3.41 3.42 3.43 3.43 3.44 3.45 3.46 3.47 3.48 3.48 3.49</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 205 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 90 more variables: x_014 &lt;dbl&gt;, x_015 &lt;dbl&gt;, x_016 &lt;dbl&gt;, x_017 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># x_018 &lt;dbl&gt;, x_019 &lt;dbl&gt;, x_020 &lt;dbl&gt;, x_021 &lt;dbl&gt;, x_022 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># x_023 &lt;dbl&gt;, x_024 &lt;dbl&gt;, x_025 &lt;dbl&gt;, x_026 &lt;dbl&gt;, x_027 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># x_028 &lt;dbl&gt;, x_029 &lt;dbl&gt;, x_030 &lt;dbl&gt;, x_031 &lt;dbl&gt;, x_032 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># x_033 &lt;dbl&gt;, x_034 &lt;dbl&gt;, x_035 &lt;dbl&gt;, x_036 &lt;dbl&gt;, x_037 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># x_038 &lt;dbl&gt;, x_039 &lt;dbl&gt;, x_040 &lt;dbl&gt;, x_041 &lt;dbl&gt;, x_042 &lt;dbl&gt;, …</span></span></span> <span></span></code></pre> </div> <p>From that dataset&rsquo;s documentation:</p> <blockquote> <p>These data are recorded on a Tecator Infratec Food and Feed Analyzer&hellip; For each meat sample the data consists of a 100 channel spectrum of absorbances and the contents of moisture (water), fat and protein. The absorbance is -log10 of the transmittance measured by the spectrometer. The three contents, measured in percent, are determined by analytic chemistry.</p> </blockquote> <p>We&rsquo;ll try to predict the protein content, as a percentage, using the absorbance measurements.</p> <p>Before we take a further look, let&rsquo;s split up our data. I&rsquo;ll first select off two other possible outcome variables and, after splitting into training and testing sets, resample the data using 5-fold cross-validation with 2 repeats.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats</span> <span class='o'>&lt;-</span> <span class='nv'>meats</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>select</span><span class='o'>(</span><span class='o'>-</span><span class='nv'>water</span>, <span class='o'>-</span><span class='nv'>fat</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>meats_split</span> <span class='o'>&lt;-</span> <span class='nf'>initial_split</span><span class='o'>(</span><span class='nv'>meats</span><span class='o'>)</span></span> <span><span class='nv'>meats_train</span> <span class='o'>&lt;-</span> <span class='nf'>training</span><span class='o'>(</span><span class='nv'>meats_split</span><span class='o'>)</span></span> <span><span class='nv'>meats_test</span> <span class='o'>&lt;-</span> <span class='nf'>testing</span><span class='o'>(</span><span class='nv'>meats_split</span><span class='o'>)</span></span> <span><span class='nv'>meats_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>meats_train</span>, v <span class='o'>=</span> <span class='m'>5</span>, repeats <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span></code></pre> </div> <p>The tricky parts of this modeling problem are that:</p> <ol> <li>There are few observations to work with (215 total).</li> <li>Each of these 100 absorbance measurements are <em>highly</em> correlated.</li> </ol> <p>Visualizing that correlation:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>meats_train</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://corrr.tidymodels.org/reference/correlate.html'>correlate</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'>theme</span><span class='o'>(</span>axis.text.x <span class='o'>=</span> <span class='nf'>element_blank</span><span class='o'>(</span><span class='o'>)</span>, axis.text.y <span class='o'>=</span> <span class='nf'>element_blank</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Correlation computed with</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Method: 'pearson'</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> Missing treated using: 'pairwise.complete.obs'</span></span> <span></span></code></pre> <p><img src="figs/correlate-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Almost all of these pairwise correlations between predictors are near 1, besides the last variable and every other variable. That last variable with weaker correlation values? It&rsquo;s the outcome.</p> <h2 id="baseline-models">Baseline models <a href="#baseline-models"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are several existing model implementations in tidymodels that are resilient to highly correlated predictors. The first one I&rsquo;d probably reach for is an elastic net: an interpolation of the LASSO and Ridge regularized linear regression models. Evaluating that modeling approach against resamples:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># define a regularized linear model</span></span> <span><span class='nv'>spec_lr</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, mixture <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span></span> <span></span> <span><span class='c'># try out different penalization approaches</span></span> <span><span class='nv'>res_lr</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_lr</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_lr</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; penalty mixture .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 0.000<span style='text-decoration: underline;'>032</span>4 0.668 rmse standard 1.24 10 0.051<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 0.000<span style='text-decoration: underline;'>000</span>005<span style='text-decoration: underline;'>24</span> 0.440 rmse standard 1.25 10 0.054<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 0.000<span style='text-decoration: underline;'>000</span>461 0.839 rmse standard 1.26 10 0.053<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 0.000<span style='text-decoration: underline;'>005</span>50 0.965 rmse standard 1.26 10 0.054<span style='text-decoration: underline;'>0</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 0.000<span style='text-decoration: underline;'>000</span>048<span style='text-decoration: underline;'>9</span> 0.281 rmse standard 1.26 10 0.053<span style='text-decoration: underline;'>4</span> Preprocessor1_Mo…</span></span> <span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_lr</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; penalty mixture .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 0.000<span style='text-decoration: underline;'>032</span>4 0.668 rsq standard 0.849 10 0.012<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 0.000<span style='text-decoration: underline;'>000</span>005<span style='text-decoration: underline;'>24</span> 0.440 rsq standard 0.848 10 0.012<span style='text-decoration: underline;'>8</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 0.000<span style='text-decoration: underline;'>000</span>461 0.839 rsq standard 0.846 10 0.011<span style='text-decoration: underline;'>4</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 0.000<span style='text-decoration: underline;'>005</span>50 0.965 rsq standard 0.846 10 0.011<span style='text-decoration: underline;'>1</span> Preprocessor1_Mo…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 0.000<span style='text-decoration: underline;'>000</span>048<span style='text-decoration: underline;'>9</span> 0.281 rsq standard 0.846 10 0.012<span style='text-decoration: underline;'>6</span> Preprocessor1_Mo…</span></span> <span></span></code></pre> </div> <p>That best RMSE value of 1.24 gives us a baseline to work with, and the best R-squared 0.85 seems like a good start.</p> <p>Many tree-based model implementations in tidymodels generally handle correlated predictors well. Just to be apples-to-apples with <code>&quot;aorsf&quot;</code>, let&rsquo;s use a different random forest engine to get a better sense for baseline performance:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_rf</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span>mtry <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, min_n <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='c'># this is the default engine, but for consistency's sake:</span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"ranger"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>res_rf</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_rf</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>i</span> <span style='color: #000000;'>Creating pre-processing data to finalize unknown parameter: mtry</span></span></span> <span></span><span></span> <span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_rf</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; mtry min_n .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 96 4 rmse standard 2.37 10 0.090<span style='text-decoration: underline;'>5</span> Preprocessor1_Model08</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 41 6 rmse standard 2.39 10 0.088<span style='text-decoration: underline;'>3</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 88 10 rmse standard 2.43 10 0.081<span style='text-decoration: underline;'>6</span> Preprocessor1_Model06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 79 17 rmse standard 2.51 10 0.074<span style='text-decoration: underline;'>0</span> Preprocessor1_Model07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 27 18 rmse standard 2.52 10 0.077<span style='text-decoration: underline;'>8</span> Preprocessor1_Model04</span></span> <span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_rf</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; mtry min_n .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 96 4 rsq standard 0.424 10 0.038<span style='text-decoration: underline;'>5</span> Preprocessor1_Model08</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 41 6 rsq standard 0.409 10 0.039<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 88 10 rsq standard 0.387 10 0.036<span style='text-decoration: underline;'>5</span> Preprocessor1_Model06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 79 17 rsq standard 0.353 10 0.040<span style='text-decoration: underline;'>4</span> Preprocessor1_Model07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 27 18 rsq standard 0.346 10 0.039<span style='text-decoration: underline;'>7</span> Preprocessor1_Model04</span></span> <span></span></code></pre> </div> <p>Not so hot. Just to show I&rsquo;m not making a straw man here, I&rsquo;ll evaluate a few more alternative modeling approaches behind the curtain and print out their best performance metrics:</p> <ul> <li><strong>Gradient boosted tree with LightGBM</strong>. Best RMSE: 2.34. Best R-squared: 0.43.</li> <li><strong>Partial least squares regression</strong>. Best RMSE: 1.39. Best R-squared: 0.81.</li> <li><strong>Support vector machine</strong>. Best RMSE: 2.28. Best R-squared: 0.46.</li> </ul> <p>This is a tricky one.</p> <h2 id="introducing-accelerated-oblique-random-forests">Introducing accelerated oblique random forests <a href="#introducing-accelerated-oblique-random-forests"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The 0.3.0 release of bonsai introduces support for accelerated oblique random forests via the <code>&quot;aorsf&quot;</code> engine for classification and regression in tidymodels. (Tidy survival modelers might note that <a href="https://www.tidyverse.org/blog/2023/04/censored-0-2-0/" target="_blank" rel="noopener">we already support <code>&quot;aorsf&quot;</code> for censored regression</a> via the <a href="https://censored.tidymodels.org" target="_blank" rel="noopener">censored</a> parsnip extension package!)</p> <p>Unlike trees in conventional random forests, which create splits using thresholds based on individual predictors (e.g. <code>x_001 &gt; 3</code>), oblique random forests use linear combinations of predictors to create splits (e.g. <code>x_001 * x_002 &gt; 7.5</code>) and have been shown to improve predictive performance related to conventional random forests for a variety of applications (Menze et al. 2011). &ldquo;Oblique&rdquo; references the appearance of decision boundaries when a set of splits is plotted; I&rsquo;ve grabbed a visual from the <a href="https://github.com/ropensci/aorsf?tab=readme-ov-file#what-does-oblique-mean" target="_blank" rel="noopener">aorsf README</a> that demonstrates:</p> <div class="highlight"> <p><img src="figures/oblique.png" alt="Two plots of decision boundaries for a classification problem. One uses single-variable splitting and the other oblique splitting. Both trees partition the predictor space defined by predictors X1 and X2, but the oblique splits do a better job of separating the two classes thanks to an 'oblique' boundary formed by considering both X1 and X2 at the same time." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In the above, we&rsquo;d like to separate the purple dots from the orange squares. A tree in a traditional random forest, represented on the left, can only generate splits based on one of X1 or X2 at a time. A tree in an oblique random forest, represented on the right, can consider both X1 and X2 in creating decision boundaries, often resulting in stronger predictive performance.</p> <p>Where does the &ldquo;accelerated&rdquo; come from? Generally, finding optimal oblique splits is computationally more intensive than finding single-predictor splits. The aorsf package uses something called &ldquo;Newton Raphson scoring&rdquo;&mdash;the same algorithm under the hood in the survival package&mdash;to identify splits based on linear combinations of predictor variables. This approach speeds up that process greatly, resulting in fit times that are analogous to implementations of traditional random forests in R (and hundreds of times faster than existing oblique random forest implementations, Jaeger et al. 2024).</p> <p>The code to tune this model with the <code>&quot;aorsf&quot;</code> engine is the same as for <code>&quot;ranger&quot;</code>, except we switch out the <code>engine</code> argument to <a href="https://parsnip.tidymodels.org/reference/set_engine.html" target="_blank" rel="noopener"><code>set_engine()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_aorsf</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span></span> <span> mtry <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>,</span> <span> min_n <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"aorsf"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>res_aorsf</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>spec_aorsf</span>, <span class='nv'>protein</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>meats_folds</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>i</span> <span style='color: #000000;'>Creating pre-processing data to finalize unknown parameter: mtry</span></span></span> <span></span><span></span> <span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_aorsf</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; mtry min_n .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 87 11 rmse standard 0.786 10 0.037<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 98 8 rmse standard 0.789 10 0.036<span style='text-decoration: underline;'>3</span> Preprocessor1_Model10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 48 5 rmse standard 0.793 10 0.036<span style='text-decoration: underline;'>3</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 16 17 rmse standard 0.803 10 0.032<span style='text-decoration: underline;'>5</span> Preprocessor1_Model09</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 31 18 rmse standard 0.813 10 0.035<span style='text-decoration: underline;'>9</span> Preprocessor1_Model05</span></span> <span></span><span><span class='nf'>show_best</span><span class='o'>(</span><span class='nv'>res_aorsf</span>, metric <span class='o'>=</span> <span class='s'>"rsq"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span></span> <span><span class='c'>#&gt; mtry min_n .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 48 5 rsq standard 0.946 10 0.004<span style='text-decoration: underline;'>46</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 98 8 rsq standard 0.945 10 0.004<span style='text-decoration: underline;'>82</span> Preprocessor1_Model10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 87 11 rsq standard 0.945 10 0.004<span style='text-decoration: underline;'>84</span> Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 16 17 rsq standard 0.941 10 0.003<span style='text-decoration: underline;'>70</span> Preprocessor1_Model09</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 31 18 rsq standard 0.940 10 0.005<span style='text-decoration: underline;'>47</span> Preprocessor1_Model05</span></span> <span></span></code></pre> </div> <p>Holy smokes. The best RMSE from aorsf is 0.79, much more performant than the previous best RMSE from the elastic net with a value of 1.24, and the best R-squared is 0.95, much stronger than the previous best (also from the elastic net) of 0.85.</p> <p>Especially if your modeling problems involve few samples of many, highly correlated predictors, give the <code>&quot;aorsf&quot;</code> modeling engine a whirl in your workflows and let us know what you think!</p> <h2 id="references">References <a href="#references"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Byron C. Jaeger, Sawyer Welden, Kristin Lenoir, Jaime L. Speiser, Matthew W. Segar, Ambarish Pandey, Nicholas M. Pajewski. 2024. &ldquo;Accelerated and Interpretable Oblique Random Survival Forests.&rdquo; <em>Journal of Computational and Graphical Statistics</em> 33.1: 192-207.</p> <p>Byron C. Jaeger, Sawyer Welden, Kristin Lenoir, and Nicholas M. Pajewski. 2022. &ldquo;aorsf: An R package for Supervised Learning Using the Oblique Random Survival Forest.&rdquo; <em>The Journal of Open Source Software</em>.</p> <p>Bjoern H. Menze, B. Michael Kelm, Daniel N. Splitthoff, Ullrich Koethe, and Fred A. Hamprecht. (2011). &ldquo;On Oblique Random Forests.&rdquo; <em>Joint European Conference on Machine Learning and Knowledge Discovery in Databases</em> (pp. 453&ndash;469). Springer.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you to <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, the aorsf author, for doing most of the work to implement aorsf support in bonsai. Thank you to <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/p-schaefer" target="_blank" rel="noopener">@p-schaefer</a>, <a href="https://github.com/seb-mueller" target="_blank" rel="noopener">@seb-mueller</a>, and <a href="https://github.com/tcovert" target="_blank" rel="noopener">@tcovert</a> for their contributions on the bonsai repository since version 0.2.1.</p> nanoparquet 0.3.0 https://www.tidyverse.org/blog/2024/06/nanoparquet-0-3-0/ Thu, 20 Jun 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/06/nanoparquet-0-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] ~~[`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)~~ --> <p>We&rsquo;re extremely pleased to announce the release of <a href="https://r-lib.github.io/nanoparquet/" target="_blank" rel="noopener">nanoparquet</a> 0.3.0. nanoparquet is a new R package that reads Parquet files into data frames, and writes data frames to Parquet files.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"nanoparquet"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will cover the features and limitations of nanoparquet, and also our future plans.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/r-lib/nanoparquet'>nanoparquet</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="what-is-parquet">What is Parquet? <a href="#what-is-parquet"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Parquet is a file format for storing data on disk. It is specifically designed for large data sets, read-heavy workloads and data analysis. The most important features of Parquet are:</p> <ul> <li> <p><strong>Columnar</strong>. Data is stored column-wise, so whole columns (or large chunks of columns) are easy to read quickly. Columnar storage allows better compression, fast operations on a subset of columns, and easy ways of removing columns or adding new columns to a data file.</p> </li> <li> <p><strong>Binary</strong>. A Parquet file is not a text file. Each Parquet data type is stored in a well-defined binary storage format, leaving no ambiguity about how fields are persed.</p> </li> <li> <p><strong>Rich types</strong>. Parquet supports a small set of <em>low level</em> data types with well specified storage formats and encodings. On top of the low level types, it implemented several higher level logical types, like UTF-8 strings, time stamps, JSON strings, ENUM types (factors), etc.</p> </li> <li> <p><strong>Well supported</strong>. At this point Parquet is well supported across modern languages like R, Python, Rust, Java, Go, etc. In particular, Apache Arrow handles Parquet files very well, and has bindings to many languages. DuckDB is a very portable tool that reads and writes Parquet files, or even opens a set of Parquet files as a database.</p> </li> <li> <p><strong>Performant</strong>. Parquet columns may use various encodings and compression to ensure that the data files are kept as small as possible. When running an analytical query on the subset of the data, the Parquet format makes it easy to skip the columns and/or rows that are irrelevant.</p> </li> <li> <p><strong>Concurrency built in</strong>. A Parquet file can be divided into row groups. Parquet readers can read, uncompress and decode row groups in parallel. Parquet writes can encode and compress row groups in parallel. Even a single column may be divided into multiple pages, that can be (un)compressed, encode and decode in parallel.</p> </li> <li> <p><strong>Missing values</strong>. Support for missing values is built into the Parquet format.</p> </li> </ul> <h2 id="why-we-created-nanoparquet">Why we created nanoparquet? <a href="#why-we-created-nanoparquet"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Although Parquet is well supported by modern languages, today the complexity of the Parquet format often outweighs its benefits for smaller data sets. Many tools that support Parquet are typically used for larger, out of memory data sets, so there is a perception that Parquet is only for big data. These tools typically take longer to compile or install, and often seem too heavy for in-memory data analysis.</p> <p>With nanoparquet, we wanted to have a smaller tool that has no dependencies and is easy to install. Our goal is to facilitate the adoption of Parquet for smaller data sets, especially for teams that share data between multiple environments, e.g. R, Python, Java, etc.</p> <h2 id="nanoparquet-features">nanoparquet Features <a href="#nanoparquet-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>These are some of the nanoparquet features that we are most excited about.</p> <ul> <li> <p><strong>Lightweight</strong>. nanoparquet has no package or system dependencies other than a C++-11 compiler. It compiles in about 30 seconds into an R package that is less than a megabyte in size.</p> </li> <li> <p><strong>Reads many Parquet files</strong>. <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>nanoparquet::read_parquet()</code></a> supports reading most Parquet files. In particular, in supports all Parquet encodings and at the time of writing it supports three compression codecs: Snappy, Gzip and Zstd. Make sure you read &ldquo;Limitations&rdquo; below.</p> </li> <li> <p><strong>Writes many R data types</strong>. <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>nanoparquet::write_parquet()</code></a> supports writing most R data frames. In particular, missing values are handled properly, factor columns are kept as factors, and temporal types are encoded correctly. Make sure you read &ldquo;Limitations&rdquo; below.</p> </li> <li> <p><strong>Type mappings</strong>. nanoparquet has a well defined set of <a href="https://r-lib.github.io/nanoparquet/reference/nanoparquet-types.html" target="_blank" rel="noopener">type mapping rules</a>. Use the <a href="https://r-lib.github.io/nanoparquet/dev/reference/parquet_column_types.html" target="_blank" rel="noopener"><code>parquet_column_types()</code></a> function to see how <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> and <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> maps Parquet and R types for a file or a data frame.</p> </li> <li> <p><strong>Metadata queries</strong>. nanoparquet has a <a href="https://r-lib.github.io/nanoparquet/dev/reference/index.html#extract-parquet-metadata" target="_blank" rel="noopener">number of functions</a> that allow you to query the metadata and schema without reading in the full dataset.</p> </li> </ul> <h2 id="examples">Examples <a href="#examples"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="reading-a-parquet-file">Reading a Parquet file <a href="#reading-a-parquet-file"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The nanoparquet R package contains an example Parquet file. We are going to use it to demonstrate how the package works.</p> <p>If the pillar package is loaded, then nanoparquet data frames are pretty-printed.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/r-lib/nanoparquet'>nanoparquet</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://pillar.r-lib.org/'>pillar</a></span><span class='o'>)</span></span> <span><span class='nv'>udf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"extdata/userdata1.parquet"</span>, package <span class='o'>=</span> <span class='s'>"nanoparquet"</span><span class='o'>)</span></span></code></pre> </div> <p>Before actually reading the file, let&rsquo;s look up some metadata about it, and also how its columns will be mapped to R types:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_info.html'>parquet_info</a></span><span class='o'>(</span><span class='nv'>udf</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1 × 7</span></span></span> <span><span class='c'>#&gt; file_name num_cols num_rows num_row_groups file_size parquet_version</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> /Users/gaborcsardi… 13 <span style='text-decoration: underline;'>1</span>000 1 <span style='text-decoration: underline;'>73</span>217 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: created_by &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_column_types.html'>parquet_column_types</a></span><span class='o'>(</span><span class='nv'>udf</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 13 × 6</span></span></span> <span><span class='c'>#&gt; file_name name type r_type repetition_type logical_type </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>*</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;I&lt;list&gt;&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> /Users/gaborcsa… regi… INT64 POSIX… REQUIRED <span style='color: #555555;'>&lt;TIMESTAMP(TRUE, micros)&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> /Users/gaborcsa… id INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> /Users/gaborcsa… firs… BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> /Users/gaborcsa… last… BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> /Users/gaborcsa… email BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> /Users/gaborcsa… gend… BYTE… factor OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> /Users/gaborcsa… ip_a… BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> /Users/gaborcsa… cc BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> /Users/gaborcsa… coun… BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> /Users/gaborcsa… birt… INT32 Date OPTIONAL <span style='color: #555555;'>&lt;DATE&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> /Users/gaborcsa… sala… DOUB… double OPTIONAL <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> /Users/gaborcsa… title BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> /Users/gaborcsa… comm… BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span></span></span> <span></span></code></pre> </div> <p>For every Parquet column we see its low level Parquet data type in <code>type</code>, e.g. <code>INT64</code> or <code>BYTE_ARRAY</code>. <code>r_type</code> the R type that <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> will create for that column. If <code>repetition_type</code> is <code>REQUIRED</code>, then that column cannot contain missing values. <code>OPTIONAL</code> columns may have missing values. <code>logical_type</code> is the higher level Parquet data type.</p> <p>E.g. the first column is an UTC (because of the <code>TRUE</code>) timestamp, in microseconds. It is stored as a 64 bit integer in the file, and it will be converted to a <code>POSIXct</code> object by <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a>.</p> <p>To actually read the file into a data frame, call <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ud1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/read_parquet.html'>read_parquet</a></span><span class='o'>(</span><span class='nv'>udf</span><span class='o'>)</span></span> <span><span class='nv'>ud1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1,000 × 13</span></span></span> <span><span class='c'>#&gt; registration id first_name last_name email gender ip_address cc </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dttm&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 2016-02-03 <span style='color: #555555;'>07:55:29</span> 1 Amanda Jordan ajord… Female 1.197.201… 6759…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 2016-02-03 <span style='color: #555555;'>17:04:03</span> 2 Albert Freeman afree… Male 218.111.1… <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 2016-02-03 <span style='color: #555555;'>01:09:31</span> 3 Evelyn Morgan emorg… Female 7.161.136… 6767…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 2016-02-03 <span style='color: #555555;'>00:36:21</span> 4 Denise Riley drile… Female 140.35.10… 3576…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 2016-02-03 <span style='color: #555555;'>05:05:31</span> 5 Carlos Burns cburn… <span style='color: #BB0000;'>NA</span> 169.113.2… 5602…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 2016-02-03 <span style='color: #555555;'>07:22:34</span> 6 Kathryn White kwhit… Female 195.131.8… 3583…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 2016-02-03 <span style='color: #555555;'>08:33:08</span> 7 Samuel Holmes sholm… Male 232.234.8… 3582…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2016-02-03 <span style='color: #555555;'>06:47:06</span> 8 Harry Howell hhowe… Male 91.235.51… <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 2016-02-03 <span style='color: #555555;'>03:52:53</span> 9 Jose Foster jfost… Male 132.31.53… <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 2016-02-03 <span style='color: #555555;'>18:29:47</span> 10 Emily Stewart estew… Female 143.28.25… 3574…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 990 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 5 more variables: country &lt;chr&gt;, birthdate &lt;date&gt;, salary &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># title &lt;chr&gt;, comments &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <h3 id="writing-a-parquet-file">Writing a Parquet file <a href="#writing-a-parquet-file"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>To show <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a>, we&rsquo;ll use the <code>flights</code> data in the nycflights13 package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/hadley/nycflights13'>nycflights13</a></span><span class='o'>)</span></span> <span><span class='nv'>flights</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 336,776 × 19</span></span></span> <span><span class='c'>#&gt; year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>2</span>013 1 1 517 515 2 830 819</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>2</span>013 1 1 533 529 4 850 830</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>2</span>013 1 1 542 540 2 923 850</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>2</span>013 1 1 544 545 -<span style='color: #BB0000;'>1</span> <span style='text-decoration: underline;'>1</span>004 <span style='text-decoration: underline;'>1</span>022</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>2</span>013 1 1 554 600 -<span style='color: #BB0000;'>6</span> 812 837</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>2</span>013 1 1 554 558 -<span style='color: #BB0000;'>4</span> 740 728</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>2</span>013 1 1 555 600 -<span style='color: #BB0000;'>5</span> 913 854</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>2</span>013 1 1 557 600 -<span style='color: #BB0000;'>3</span> 709 723</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>2</span>013 1 1 557 600 -<span style='color: #BB0000;'>3</span> 838 846</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>2</span>013 1 1 558 600 -<span style='color: #BB0000;'>2</span> 753 745</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 336,766 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 11 more variables: arr_delay &lt;dbl&gt;, carrier &lt;chr&gt;, flight &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># tailnum &lt;chr&gt;, origin &lt;chr&gt;, dest &lt;chr&gt;, air_time &lt;dbl&gt;, distance &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># hour &lt;dbl&gt;, minute &lt;dbl&gt;, time_hour &lt;dttm&gt;</span></span></span> <span></span></code></pre> </div> <p>First we check how columns of <code>flights</code> will be mapped to Parquet types:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_column_types.html'>parquet_column_types</a></span><span class='o'>(</span><span class='nv'>flights</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 19 × 6</span></span></span> <span><span class='c'>#&gt; file_name name type r_type repetition_type logical_type </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;I&lt;list&gt;&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #BB0000;'>NA</span> year INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #BB0000;'>NA</span> month INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #BB0000;'>NA</span> day INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #BB0000;'>NA</span> dep_time INT32 integ… OPTIONAL <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #BB0000;'>NA</span> sched_dep_t… INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #BB0000;'>NA</span> dep_delay DOUB… double OPTIONAL <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #BB0000;'>NA</span> arr_time INT32 integ… OPTIONAL <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #BB0000;'>NA</span> sched_arr_t… INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #BB0000;'>NA</span> arr_delay DOUB… double OPTIONAL <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #BB0000;'>NA</span> carrier BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> <span style='color: #BB0000;'>NA</span> flight INT32 integ… REQUIRED <span style='color: #555555;'>&lt;INT(32, TRUE)&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> <span style='color: #BB0000;'>NA</span> tailnum BYTE… chara… OPTIONAL <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> <span style='color: #BB0000;'>NA</span> origin BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> <span style='color: #BB0000;'>NA</span> dest BYTE… chara… REQUIRED <span style='color: #555555;'>&lt;STRING&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> <span style='color: #BB0000;'>NA</span> air_time DOUB… double OPTIONAL <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> <span style='color: #BB0000;'>NA</span> distance DOUB… double REQUIRED <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> <span style='color: #BB0000;'>NA</span> hour DOUB… double REQUIRED <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> <span style='color: #BB0000;'>NA</span> minute DOUB… double REQUIRED <span style='color: #555555;'>&lt;NULL&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> <span style='color: #BB0000;'>NA</span> time_hour INT64 POSIX… REQUIRED <span style='color: #555555;'>&lt;TIMESTAMP(TRUE, micros)&gt;</span></span></span> <span></span></code></pre> </div> <p>This looks fine, so we go ahead and write out the file. By default it will be Snappy-compressed, and many columns will be dictionary encoded.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/write_parquet.html'>write_parquet</a></span><span class='o'>(</span><span class='nv'>flights</span>, <span class='s'>"flights.parquet"</span><span class='o'>)</span></span></code></pre> </div> <h3 id="parquet-metadata">Parquet metadata <a href="#parquet-metadata"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Use <a href="https://r-lib.github.io/nanoparquet/reference/parquet_schema.html" target="_blank" rel="noopener"><code>parquet_schema()</code></a> to see the schema of a Parquet file. The schema also includes &ldquo;internal&rdquo; parquet columns. Every Parquet file is a tree where columns may be part of an &ldquo;internal&rdquo; column. nanoparquet currently only supports flat files, that consist of a single internal root column and all other columns are leaf columns and are children of the root:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_schema.html'>parquet_schema</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 20 × 11</span></span></span> <span><span class='c'>#&gt; file_name name type type_length repetition_type converted_type</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> flights.parquet schema <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> flights.parquet year INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> flights.parquet month INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> flights.parquet day INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> flights.parquet dep_time INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> flights.parquet sched_dep_t… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> flights.parquet dep_delay DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> flights.parquet arr_time INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> flights.parquet sched_arr_t… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> flights.parquet arr_delay DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> flights.parquet carrier BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> flights.parquet flight INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> flights.parquet tailnum BYTE… <span style='color: #BB0000;'>NA</span> OPTIONAL UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> flights.parquet origin BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> flights.parquet dest BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> flights.parquet air_time DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> flights.parquet distance DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> flights.parquet hour DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> flights.parquet minute DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> flights.parquet time_hour INT64 <span style='color: #BB0000;'>NA</span> REQUIRED TIMESTAMP_MIC…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 5 more variables: logical_type &lt;I&lt;list&gt;&gt;, num_children &lt;int&gt;, scale &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># precision &lt;int&gt;, field_id &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>To see more information about a Parquet file, use <a href="https://r-lib.github.io/nanoparquet/reference/parquet_metadata.html" target="_blank" rel="noopener"><code>parquet_metadata()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_metadata.html'>parquet_metadata</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; $file_meta_data</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1 × 5</span></span></span> <span><span class='c'>#&gt; file_name version num_rows key_value_metadata created_by </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;I&lt;list&gt;&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> flights.parquet 1 <span style='text-decoration: underline;'>336</span>776 <span style='color: #555555;'>&lt;tbl [1 × 2]&gt;</span> https://github.com/gaborc…</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; $schema</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 20 × 11</span></span></span> <span><span class='c'>#&gt; file_name name type type_length repetition_type converted_type</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> flights.parquet schema <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> flights.parquet year INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> flights.parquet month INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> flights.parquet day INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> flights.parquet dep_time INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> flights.parquet sched_dep_t… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> flights.parquet dep_delay DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> flights.parquet arr_time INT32 <span style='color: #BB0000;'>NA</span> OPTIONAL INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> flights.parquet sched_arr_t… INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> flights.parquet arr_delay DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> flights.parquet carrier BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> flights.parquet flight INT32 <span style='color: #BB0000;'>NA</span> REQUIRED INT_32 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> flights.parquet tailnum BYTE… <span style='color: #BB0000;'>NA</span> OPTIONAL UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> flights.parquet origin BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> flights.parquet dest BYTE… <span style='color: #BB0000;'>NA</span> REQUIRED UTF8 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> flights.parquet air_time DOUB… <span style='color: #BB0000;'>NA</span> OPTIONAL <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> flights.parquet distance DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> flights.parquet hour DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> flights.parquet minute DOUB… <span style='color: #BB0000;'>NA</span> REQUIRED <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> flights.parquet time_hour INT64 <span style='color: #BB0000;'>NA</span> REQUIRED TIMESTAMP_MIC…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 5 more variables: logical_type &lt;I&lt;list&gt;&gt;, num_children &lt;int&gt;, scale &lt;int&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># precision &lt;int&gt;, field_id &lt;int&gt;</span></span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; $row_groups</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 1 × 7</span></span></span> <span><span class='c'>#&gt; file_name id total_byte_size num_rows file_offset total_compressed_size</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> flights.parq… 0 5<span style='text-decoration: underline;'>732</span>430 <span style='text-decoration: underline;'>336</span>776 <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: ordinal &lt;int&gt;</span></span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; $column_chunks</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 19 × 19</span></span></span> <span><span class='c'>#&gt; file_name row_group column file_path file_offset offset_index_offset</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> flights.parquet 0 0 <span style='color: #BB0000;'>NA</span> 23 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> flights.parquet 0 1 <span style='color: #BB0000;'>NA</span> 111 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> flights.parquet 0 2 <span style='color: #BB0000;'>NA</span> 323 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> flights.parquet 0 3 <span style='color: #BB0000;'>NA</span> <span style='text-decoration: underline;'>6</span>738 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> flights.parquet 0 4 <span style='color: #BB0000;'>NA</span> <span style='text-decoration: underline;'>468</span>008 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> flights.parquet 0 5 <span style='color: #BB0000;'>NA</span> <span style='text-decoration: underline;'>893</span>557 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> flights.parquet 0 6 <span style='color: #BB0000;'>NA</span> 1<span style='text-decoration: underline;'>312</span>660 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> flights.parquet 0 7 <span style='color: #BB0000;'>NA</span> 1<span style='text-decoration: underline;'>771</span>896 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> flights.parquet 0 8 <span style='color: #BB0000;'>NA</span> 2<span style='text-decoration: underline;'>237</span>931 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> flights.parquet 0 9 <span style='color: #BB0000;'>NA</span> 2<span style='text-decoration: underline;'>653</span>250 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> flights.parquet 0 10 <span style='color: #BB0000;'>NA</span> 2<span style='text-decoration: underline;'>847</span>249 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> flights.parquet 0 11 <span style='color: #BB0000;'>NA</span> 3<span style='text-decoration: underline;'>374</span>563 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> flights.parquet 0 12 <span style='color: #BB0000;'>NA</span> 3<span style='text-decoration: underline;'>877</span>832 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> flights.parquet 0 13 <span style='color: #BB0000;'>NA</span> 3<span style='text-decoration: underline;'>966</span>418 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> flights.parquet 0 14 <span style='color: #BB0000;'>NA</span> 4<span style='text-decoration: underline;'>264</span>662 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> flights.parquet 0 15 <span style='color: #BB0000;'>NA</span> 4<span style='text-decoration: underline;'>639</span>410 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> flights.parquet 0 16 <span style='color: #BB0000;'>NA</span> 4<span style='text-decoration: underline;'>976</span>781 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> flights.parquet 0 17 <span style='color: #BB0000;'>NA</span> 5<span style='text-decoration: underline;'>120</span>936 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> flights.parquet 0 18 <span style='color: #BB0000;'>NA</span> 5<span style='text-decoration: underline;'>427</span>022 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 13 more variables: offset_index_length &lt;int&gt;, column_index_offset &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># column_index_length &lt;int&gt;, type &lt;chr&gt;, encodings &lt;I&lt;list&gt;&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># path_in_schema &lt;I&lt;list&gt;&gt;, codec &lt;chr&gt;, num_values &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># total_uncompressed_size &lt;dbl&gt;, total_compressed_size &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># data_page_offset &lt;dbl&gt;, index_page_offset &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># dictionary_page_offset &lt;dbl&gt;</span></span></span> <span></span></code></pre> </div> <p>The output will include the schema, as above, but also data about the row groups ( <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> always writes a single row group currently), and column chunks. There is one column chunk per column in each row group.</p> <p>The columns chunk information also tells you whether a column chunk is dictionary encoded, its encoding, its size, etc.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cc</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://r-lib.github.io/nanoparquet/reference/parquet_metadata.html'>parquet_metadata</a></span><span class='o'>(</span><span class='s'>"flights.parquet"</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>column_chunks</span></span> <span><span class='nv'>cc</span><span class='o'>[</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"column"</span>, <span class='s'>"encodings"</span>, <span class='s'>"dictionary_page_offset"</span><span class='o'>)</span><span class='o'>]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A data frame: 19 × 3</span></span></span> <span><span class='c'>#&gt; column encodings dictionary_page_offset</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;I&lt;list&gt;&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 0 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 48</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 2 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 181</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 3 <span style='color: #555555;'>&lt;chr [3]&gt;</span> <span style='text-decoration: underline;'>1</span>445</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 4 <span style='color: #555555;'>&lt;chr [3]&gt;</span> <span style='text-decoration: underline;'>463</span>903</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 5 <span style='color: #555555;'>&lt;chr [3]&gt;</span> <span style='text-decoration: underline;'>891</span>412</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 6 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 1<span style='text-decoration: underline;'>306</span>995</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 7 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 1<span style='text-decoration: underline;'>767</span>223</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 8 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 2<span style='text-decoration: underline;'>235</span>594</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 9 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 2<span style='text-decoration: underline;'>653</span>154</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 10 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 2<span style='text-decoration: underline;'>831</span>850</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 11 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 3<span style='text-decoration: underline;'>352</span>496</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> 12 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 3<span style='text-decoration: underline;'>877</span>796</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> 13 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 3<span style='text-decoration: underline;'>965</span>856</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> 14 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 4<span style='text-decoration: underline;'>262</span>597</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> 15 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 4<span style='text-decoration: underline;'>638</span>461</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> 16 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 4<span style='text-decoration: underline;'>976</span>675</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> 17 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 5<span style='text-decoration: underline;'>120</span>660</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> 18 <span style='color: #555555;'>&lt;chr [3]&gt;</span> 5<span style='text-decoration: underline;'>379</span>476</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cc</span><span class='o'>[[</span><span class='s'>"encodings"</span><span class='o'>]</span><span class='o'>]</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span><span class='o'>]</span></span> <span><span class='c'>#&gt; [[1]]</span></span> <span><span class='c'>#&gt; [1] "PLAIN" "RLE" "RLE_DICTIONARY"</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[2]]</span></span> <span><span class='c'>#&gt; [1] "PLAIN" "RLE" "RLE_DICTIONARY"</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[3]]</span></span> <span><span class='c'>#&gt; [1] "PLAIN" "RLE" "RLE_DICTIONARY"</span></span> <span></span></code></pre> </div> <h2 id="limitations">Limitations <a href="#limitations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>nanoparquet 0.3.0 has a number of limitations.</p> <ul> <li> <p><strong>Only flat tables</strong>. <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> can only read flat tables, i.e. Parquet files without nested columns. (Technically all Parquet files are nested, and nanoparquet supports exactly one level of nesting: a single meta column that contains all other columns.) Similarly, <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> will not write list columns.</p> </li> <li> <p><strong>Unsupported Parquet types</strong>. <a href="https://r-lib.github.io/nanoparquet/reference/read_parquet.html" target="_blank" rel="noopener"><code>read_parquet()</code></a> reads some Parquet types as raw vectors of a list column currently, e.g. <code>FLOAT16</code>, <code>INTERVAL</code>. See <a href="https://r-lib.github.io/nanoparquet/reference/nanoparquet-types.html" target="_blank" rel="noopener">the manual</a> for details.</p> </li> <li> <p><strong>No encryption</strong>. Encrypted Parquet files are not supported.</p> </li> <li> <p><strong>Missing compression codecs</strong>. <code>LZO</code>, <code>BROTLI</code> and <code>LZ4</code> compression is not yet supported.</p> </li> <li> <p><strong>No statistics</strong>. nanoparquet does not read or write statistics, e.g. minimum and maximum values from and to Parquet files.</p> </li> <li> <p><strong>No checksums</strong>. nanoparquet does not check or write checksums currently.</p> </li> <li> <p><strong>No Bloom filters</strong>. nanoparquet does not currently support reading or writing Bloom filters from or to Parquet files.</p> </li> <li> <p><strong>May be slow for large files</strong>. Being single-threaded and not fully optimized, nanoparquet is probably not suited well for large data sets. It should be fine for a couple of gigabytes. It may be fine if all the data fits into memory comfortably.</p> </li> <li> <p><strong>Single row group</strong>. <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> always creates a single row group, which is not optimal for large files.</p> </li> <li> <p><strong>Automatic encoding</strong>. It is currently not possible to choose encodings in <a href="https://r-lib.github.io/nanoparquet/reference/write_parquet.html" target="_blank" rel="noopener"><code>write_parquet()</code></a> manually.</p> </li> </ul> <p>We are planning on solving these limitations, while keeping nanoparquet as lean as possible. In particular, if you find a Parquet file that nanoparquet cannot read, please report an issue in our <a href="https://github.com/r-lib/nanoparquet/issues" target="_blank" rel="noopener">issue tracker</a>!</p> <h2 id="other-tools-for-parquet-files">Other tools for Parquet files <a href="#other-tools-for-parquet-files"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you run into some of these limitations, chances are you are dealing with a larget data set, and you will probably benefit from using tools geared towards larger Parquet files. Luckily you have several options.</p> <h3 id="in-r">In R <a href="#in-r"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3> <h4 id="apache-arrow">Apache Arrow <a href="#apache-arrow"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>You can usually install the <code>arrow</code> package from CRAN. Note, however, that some CRAN builds are suboptimal at the time of writing, e.g. the macOS builds lack Parquet support and it is best to install arrow from <a href="https://apache.r-universe.dev/arrow" target="_blank" rel="noopener">R-universe</a> on these platforms.</p> <p>Call <a href="https://arrow.apache.org/docs/r/reference/read_parquet.html" target="_blank" rel="noopener"><code>arrow::read_parquet()</code></a> to read Parquet files, and <a href="https://arrow.apache.org/docs/r/reference/write_parquet.html" target="_blank" rel="noopener"><code>arrow::write_parquet()</code></a> to write them. You can also use <a href="https://arrow.apache.org/docs/r/reference/open_dataset.html" target="_blank" rel="noopener"><code>arrow::open_dataset()</code></a> to open (one or more) Parquet files and perform queries on them without loading all data into memory.</p> <h4 id="duckdb">DuckDB <a href="#duckdb"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>DuckDB is an excellent tool that handles Parquet files seemlessly. You can install the duckdb R package from CRAN.</p> <p>To read a Parquet file into an R data frame with DuckDB, call</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">df</span> <span class="o">&lt;-</span> <span class="n">duckdb</span><span class="o">:::</span><span class="nf">sql</span><span class="p">(</span><span class="s">&#34;FROM &#39;file.parquet&#39;&#34;</span><span class="p">)</span> </code></pre></div><p>Alternatively, you can open (one or more) Parquet files and query them as a DuckDB database, potentially without reading all data into memory at once.</p> <p>Here is an example that shows how to put an R data frame into a (temporary) DuckDB database, and how to export it to Parquet:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">drv</span> <span class="o">&lt;-</span> <span class="n">duckdb</span><span class="o">::</span><span class="nf">duckdb</span><span class="p">()</span> <span class="n">con</span> <span class="o">&lt;-</span> <span class="n">DBI</span><span class="o">::</span><span class="nf">dbConnect</span><span class="p">(</span><span class="n">drv</span><span class="p">)</span> <span class="nf">on.exit</span><span class="p">(</span><span class="n">DBI</span><span class="o">::</span><span class="nf">dbDisconnect</span><span class="p">(</span><span class="n">con</span><span class="p">),</span> <span class="n">add</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="n">DBI</span><span class="o">::</span><span class="nf">dbWriteTable</span><span class="p">(</span><span class="n">con</span><span class="p">,</span> <span class="s">&#34;mtcars&#34;</span><span class="p">,</span> <span class="n">mtcars</span><span class="p">)</span> <span class="n">DBI</span><span class="o">::</span><span class="nf">dbExecute</span><span class="p">(</span><span class="n">con</span><span class="p">,</span> <span class="n">DBI</span><span class="o">::</span><span class="nf">sqlInterpolate</span><span class="p">(</span><span class="n">con</span><span class="p">,</span> <span class="s">&#34;COPY mtcars TO ?filename (FORMAT &#39;parquet&#39;, COMPRESSION &#39;snappy&#39;)&#34;</span><span class="p">,</span> <span class="n">filename</span> <span class="o">=</span> <span class="s">&#39;mtcars.parquet&#39;</span> <span class="p">))</span> </code></pre></div> <h3 id="in-python">In Python <a href="#in-python"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>There are at least three good options to handle Parquet files in Python. Just like for R, the first two are <a href="https://arrow.apache.org/docs/python/index.html" target="_blank" rel="noopener">Apache Arrow</a> and <a href="https://duckdb.org/docs/api/python/overview.html" target="_blank" rel="noopener">DuckDB</a>. You can also try the <a href="https://pypi.org/project/fastparquet/" target="_blank" rel="noopener">fastparquet</a> Python package for a potentially lighter solution.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>nanoparquet would not exist without the work of Hannes Mühleisen on <a href="https://github.com/hannes/miniparquet" target="_blank" rel="noopener">miniparquet</a>, which had similar goals, but it is discontinued now. nanoparquet is a fork of miniparquet.</p> <p>nanoparquet also contains code and test Parquet files from DuckDB, Apache Parquet, Apache Arrow, Apache Thrift, it contains libraries from Google, Facebook, etc. see the <a href="https://github.com/r-lib/nanoparquet/blob/main/inst/COPYRIGHTS" target="_blank" rel="noopener">COPYRIGHTS file</a> in the repository for the full details.</p> marquee 0.1.0 https://www.tidyverse.org/blog/2024/05/marquee-0-1-0/ Wed, 29 May 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/05/marquee-0-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [-] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>I am super excited to announce the initial release of <a href="https://marquee.r-lib.org" target="_blank" rel="noopener">marquee</a>, a markdown parser and renderer for R graphics that allows native rich text formatting of text in graphics created with grid (which includes ggplot2 and lattice).</p> <p>The inception of this package goes all the way back to 2017:</p> <blockquote class="twitter-tweet"> <p lang="en" dir="ltr"> <p>May I present: Text wrapping of theme elements in <a href="https://twitter.com/hashtag/ggplot2?src=hash&amp;ref_src=twsrc%5Etfw">#ggplot2</a> with the new (experimental) element_textbox in <a href="https://twitter.com/hashtag/ggforce?src=hash&amp;ref_src=twsrc%5Etfw">#ggforce</a><a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw">#rstats</a> <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw">#dataviz</a> <a href="https://t.co/JJMLcuTBqx">pic.twitter.com/JJMLcuTBqx</a></p> </p> <p>&mdash; Thomas Lin Pedersen (@thomasp85) <a href="https://twitter.com/thomasp85/status/816967301014634497?ref_src=twsrc%5Etfw">January 5, 2017</a></p> </blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> <p>(yeah&hellip;) where I developed an experimental feature for ggforce that allowed automatic text wrapping in <a href="https://ggplot2.tidyverse.org/reference/element.html" target="_blank" rel="noopener"><code>element_text()</code></a>. Years passed, slowly improving the text rendering capabilities in R until we are finally at a point in the toolchain where something like marquee can deliver on my initial plans.</p> <p>If this has you intrigued you can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"marquee"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will go through the features of marquee, along with discussing some of its current limitations, all of which are hopefully transient.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://marquee.r-lib.org'>marquee</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="an-example">An example <a href="#an-example"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Since the use of markdown is second-hand nature for most people at this point, there shouldn&rsquo;t be much surprise in what marquee is capable off, so let&rsquo;s start with an example to show the main use:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>md_text</span> <span class='o'>&lt;-</span> </span> <span><span class='s'>"# Intro</span></span> <span><span class='s'>markdown has been *quite* succesful in creating a unified way of specifying </span></span> <span><span class='s'>_semantic_ rich text. While limited, it provides both &#123;.steelblue readability&#125; and</span></span> <span><span class='s'>just enough ~power~ features.</span></span> <span><span class='s'></span></span> <span><span class='s'> text &lt;- \"markdown **text**\"</span></span> <span><span class='s'> marquee_grob(text)</span></span> <span><span class='s'></span></span> <span><span class='s'>It features, among others:</span></span> <span><span class='s'></span></span> <span><span class='s'>1. lists</span></span> <span><span class='s'></span></span> <span><span class='s'>2. code blocks</span></span> <span><span class='s'></span></span> <span><span class='s'> * Indented lists</span></span> <span><span class='s'> </span></span> <span><span class='s'>3. and more...</span></span> <span><span class='s'>"</span></span> <span></span> <span><span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.draw.html'>grid.draw</a></span><span class='o'>(</span><span class='nf'><a href='https://marquee.r-lib.org/reference/marquee_grob.html'>marquee_grob</a></span><span class='o'>(</span><span class='nv'>md_text</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The above illustrates a couple of things. First and foremost, that markdown works in very unsurprising ways and you get what you type. In fact, the full CommonMark syntax is supported along with extensions for underline and strikethrough. Further, it shows that marquee provides its own extension for specifying custom span elements in the form of the <code>{.class &lt;text&gt;}</code> syntax. The renderer is clever in interpreting the class so that if it corresponds to a colour name, the colour is automatically applied to the text. Lastly, it shows that the default styling of markdown closely follows the look you&rsquo;ve come to expect from markdown rendered to HTML.</p> <h2 id="use-in-ggplot2">Use in ggplot2 <a href="#use-in-ggplot2"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The number of people using this directly in grid is probably small. It is more likely that you access the functionality of marquee through higher level functions. Marquee provides two such functions aimed at making it easy to use marquee in ggplot2. The aim is to eventually move these into ggplot2 proper, but while we are in the initial phase of development they will stay in this package.</p> <h3 id="geom_marquee"><code>geom_marquee()</code> <a href="#geom_marquee"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The first function is (obviously) a geom. It is intended as a stand-in replacement for both <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_text()</code></a> and <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_label()</code></a>. As with <a href="https://marquee.r-lib.org/reference/marquee_grob.html" target="_blank" rel="noopener"><code>marquee_grob()</code></a> it works very unsurprisingly:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='c'># Add styling around the first word</span></span> <span><span class='nv'>red_bold_names</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/grep.html'>sub</a></span><span class='o'>(</span><span class='s'>"(\\w+)"</span>, <span class='s'>"&#123;.red **\\1**&#125;"</span>, <span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>rownames</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://marquee.r-lib.org/reference/geom_marquee.html'>geom_marquee</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>mpg</span>, y <span class='o'>=</span> <span class='nv'>disp</span>, label <span class='o'>=</span> <span class='nv'>red_bold_names</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Apart from standard, but markdown-aware, <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_text()</code></a> behaviour, the geom also gains a <code>width</code> aesthetic that allows you to turn on automatic soft wrapping of the text. In addition to this it gains a <code>style</code> aesthetic to finely control the style (more about styling below)</p> <h3 id="element_marquee"><code>element_marquee()</code> <a href="#element_marquee"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The second obvious use for marquee in ggplot2 is in formatting text elements. <a href="https://marquee.r-lib.org/reference/element_marquee.html" target="_blank" rel="noopener"><code>element_marquee()</code></a> is a replacement for <a href="https://ggplot2.tidyverse.org/reference/element.html" target="_blank" rel="noopener"><code>element_text()</code></a> that does just that.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>mpg</span>, y <span class='o'>=</span> <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='nv'>md_text</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>plot.title <span class='o'>=</span> <span class='nf'><a href='https://marquee.r-lib.org/reference/element_marquee.html'>element_marquee</a></span><span class='o'>(</span>size <span class='o'>=</span> <span class='m'>8</span>, width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>16</span>, <span class='s'>"cm"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="styling">Styling <a href="#styling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As alluded to above, marquee comes with a styling API that is reminiscent of CSS but completely its own. In some sense it takes the &ldquo;simplicity over power&rdquo; approach from markdown and applies it to styling.</p> <p>In marquee, each element type (e.g. a code block) has its own style. This style can be incomplete in which case it inherits the remaining specifications from the parent element in the document. As an example, the <code>em</code> element has the following default style <code>style(italic = TRUE)</code>, that is, take whatever style is currently in effect but also make the text italic.</p> <p>Apart from the direct inheritance of the marquee styling, it is also possible to use relative inheritance for numeric specifications (e.g. <code>lineheight = relative(2)</code> to double the current lineheight) or set sizes based on the current or root element font size (using <a href="https://marquee.r-lib.org/reference/style_helpers.html" target="_blank" rel="noopener"><code>em()</code></a> and <a href="https://marquee.r-lib.org/reference/style_helpers.html" target="_blank" rel="noopener"><code>rem()</code></a> respectively). Lastly, you can also mark a specification as &ldquo;non-inheritable&rdquo; using <a href="https://marquee.r-lib.org/reference/style_helpers.html" target="_blank" rel="noopener"><code>skip_inherit()</code></a>. This essentially instructs any children to not inherit the value but instead inherit the value from the grand-parent element.</p> <h2 id="images">Images <a href="#images"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Markdown (famously) supports adding images through the <code>![alt text](path/to/image)</code> syntax. Since marquee supports the full CommonMark spec, this is of course also supported. The only limitation is that the &ldquo;alt text&rdquo; is ignored since hovering tool-tips or screen-readers are not supported for the output types that marquee renders to.</p> <p>If an image is placed on a line together with surrounding text it will be rendered to fit the line height of the line. If it is placed by itself on its own line it will span the width available:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>logo</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"help"</span>, <span class='s'>"figures"</span>, <span class='s'>"logo.png"</span>, package <span class='o'>=</span> <span class='s'>"marquee"</span><span class='o'>)</span></span> <span><span class='nv'>header_img</span> <span class='o'>&lt;-</span> <span class='s'>"thumbnail-wd.jpg"</span></span> <span></span> <span><span class='nv'>md_img</span> <span class='o'>&lt;-</span> </span> <span><span class='s'>"# About marquee ![](&#123;logo&#125;)</span></span> <span><span class='s'></span></span> <span><span class='s'>Both PNG (above), JPEG (below), and SVG (not shown) are supported</span></span> <span><span class='s'></span></span> <span><span class='s'>![](&#123;header_img&#125;)</span></span> <span><span class='s'></span></span> <span><span class='s'>The above image is treated like a block element</span></span> <span><span class='s'>"</span></span> <span></span> <span><span class='nv'>md_img</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://marquee.r-lib.org/reference/marquee_glue.html'>marquee_glue</a></span><span class='o'>(</span><span class='nv'>md_img</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.draw.html'>grid.draw</a></span><span class='o'>(</span><span class='nf'><a href='https://marquee.r-lib.org/reference/marquee_grob.html'>marquee_grob</a></span><span class='o'>(</span><span class='nv'>md_img</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Apart from showing support for images we also introduce a new function above, <a href="https://marquee.r-lib.org/reference/marquee_glue.html" target="_blank" rel="noopener"><code>marquee_glue()</code></a>. It is a function that works very much like <a href="https://glue.tidyverse.org/reference/glue.html" target="_blank" rel="noopener"><code>glue::glue()</code></a> and performs text interpolation. However, this variant understands the custom span syntax of marquee so that these will not be treated as interpolation sites. Further, it turns off the <code>#</code> interpretation as a comment character as this interferes with the markdown header syntax.</p> <p>All of the above is pretty standard markdown and since I prefixed this whole blog post with &ldquo;full markdown support&rdquo; it shouldn&rsquo;t come as a big surprise. However, marquee has one last trick up its sleeve: R graphics interpolation. Quite simply, if you, instead of providing a path to a file, provide the name of an R variable holding a graphic object, this will be included as an image. Here&rsquo;s how it works:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>plot</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nv'>disp</span><span class='o'>)</span>, <span class='nv'>mtcars</span><span class='o'>[</span><span class='m'>1</span>,<span class='o'>]</span>, colour <span class='o'>=</span> <span class='s'>"red"</span>, size <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>point</span> <span class='o'>&lt;-</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.points.html'>pointsGrob</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>0.5</span>, y <span class='o'>=</span> <span class='m'>0.5</span>, pch <span class='o'>=</span> <span class='m'>19</span>, gp <span class='o'>=</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>col <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>md_plots</span> <span class='o'>&lt;-</span> </span> <span><span class='s'>"# Plots</span></span> <span><span class='s'>In the plot below, the red dot (![](point)) shows the Mazda RX4</span></span> <span><span class='s'></span></span> <span><span class='s'>![](plot)</span></span> <span><span class='s'>"</span></span> <span></span> <span><span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.draw.html'>grid.draw</a></span><span class='o'>(</span><span class='nf'><a href='https://marquee.r-lib.org/reference/marquee_grob.html'>marquee_grob</a></span><span class='o'>(</span><span class='nv'>md_plots</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This also means that your ggplots can contain additional ggplots (or other graphics) anywhere you are allowed to place text (using <a href="https://marquee.r-lib.org/reference/geom_marquee.html" target="_blank" rel="noopener"><code>geom_marquee()</code></a> and <a href="https://marquee.r-lib.org/reference/element_marquee.html" target="_blank" rel="noopener"><code>element_marquee()</code></a>) - for better or worse&hellip;</p> <p><img src="figs/they_didnt_stop.gif" style="display:block;margin:auto;" /></p> <h2 id="limitations">Limitations <a href="#limitations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Marquee&rsquo;s biggest limitation is its reliance on very new features in the graphics engine. The rendering will <em>not</em> work on anything before R 4.3, but even then it requires the graphics device to support a range of new features, most importantly the new glyph specification introduced in R 4.3. While several graphics devices do support the required features, most notably those powered by Cairo as well as all devices in ragg, many do not. The default Windows graphics device continues to lag behind and the default on macOS, while supporting glyphs, can crash in some situations bringing the whole R session down with it (this is still being investigated). So we are in no doubt threading the frontier here. All of this is set to resolve itself (maybe except for the default Windows device) as time passes.</p> <p>A limitation of great interest to me is the lack of support in svglite. svglite is build on a core idea of post-editability and thus wants all its text to be selectable and editable when opened in a capable program such as Adobe Illustrator. However, the graphics engine API that powers the new capabilities does not really allow this and I&rsquo;m still figuring out how to reconcile it. It will eventually be solved though.</p> <p>Lastly, while not really part of HTML syntax directly, many people rely on HTML inside markdown documents to solve layout and styling tasks that markdown doesn&rsquo;t support. The way it works is that markdown passes the HTML through unmodified and then the HTML is parsed by the HTML renderer (often the browser) used to display the rendered markdown document. This makes it seem like understanding HTML is part of markdown, while it&rsquo;s really not. The reason I&rsquo;m going through all this explanation is to say that marquee has no understanding of HTML and will not render it as expected. While some HTML tags and CSS settings have clear counterparts in markdown and the marquee styling system it is much better to have a clear &ldquo;no-support&rdquo; over an arbitrary limited support. <a href="https://marquee.r-lib.org/reference/marquee_grob.html" target="_blank" rel="noopener"><code>marquee_grob()</code></a>/ <a href="https://marquee.r-lib.org/reference/marquee_parse.html" target="_blank" rel="noopener"><code>marquee_parse()</code></a> have an argument (<code>ignore_html</code>) that controls whether HTML are outright removed from the output (default), or if it is included verbatim.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Marquee is the latest in a stream of advancements when it comes to text rendering and font support in R. It builds on top of my work with <a href="https://systemfonts.r-lib.org/index.html" target="_blank" rel="noopener">systemfonts</a>, <a href="https://github.com/r-lib/textshaping" target="_blank" rel="noopener">textshaping</a>, and <a href="https://ragg.r-lib.org/index.html" target="_blank" rel="noopener">ragg</a>, but also pays great debt to Paul Murrell&rsquo;s work on adding a new, more low level API for text rendering to grid and the graphics engine. Lastly, Claus Wilke&rsquo;s work on <a href="https://wilkelab.org/gridtext/" target="_blank" rel="noopener">gridtext</a> and <a href="https://wilkelab.org/ggtext/" target="_blank" rel="noopener">ggtext</a> showed the power and need for rich text support in R and filled a gap until the technical foundation for marquee was built out.</p> Q1 2024 tidymodels digest https://www.tidyverse.org/blog/2024/04/tidymodels-2024-q1/ Wed, 24 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/tidymodels-2024-q1/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these posts from the past couple of months:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/" target="_blank" rel="noopener">Survival analysis for time-to-event data with tidymodels</a></li> <li> <a href="https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/" target="_blank" rel="noopener">Fair machine learning with tidymodels</a></li> <li> <a href="https://www.tidyverse.org/blog/2024/04/tune-1-2-0/" target="_blank" rel="noopener">tune 1.2.0</a></li> </ul> <p>Additionally, we have published several related articles on <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels.org</a>:</p> <ul> <li> <a href="https://www.tidymodels.org/learn/statistics/survival-case-study/" target="_blank" rel="noopener">How long until building complaints are dispositioned? A survival analysis case study</a></li> <li> <a href="https://www.tidymodels.org/learn/statistics/survival-metrics/" target="_blank" rel="noopener">Dynamic Performance Metrics for Event Time Data</a></li> <li> <a href="https://www.tidymodels.org/learn/statistics/survival-metrics-details/" target="_blank" rel="noopener">Accounting for Censoring in Performance Metrics for Event Time Data</a></li> <li> <a href="https://www.tidymodels.org/learn/work/fairness-detectors/" target="_blank" rel="noopener">Are GPT detectors fair? A machine learning fairness case study</a></li> <li> <a href="https://www.tidymodels.org/learn/work/fairness-readmission/" target="_blank" rel="noopener">Fair prediction of hospital readmission: a machine learning fairness case study</a></li> <li> <a href="https://www.tidymodels.org/learn/models/bootstrap-metrics/" target="_blank" rel="noopener">Confidence Intervals for Performance Metrics</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2024/01/tidymodels-2023-q4/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 21 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>baguette <a href="https://baguette.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>brulee <a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a></li> <li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.4)</a></li> <li>censored <a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a></li> <li>dials <a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a></li> <li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.4)</a></li> <li>finetune <a href="https://finetune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>hardhat <a href="https://hardhat.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.1)</a></li> <li>modeldata <a href="https://modeldata.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.0)</a></li> <li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a></li> <li>probably <a href="https://probably.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.10)</a></li> <li>rsample <a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.1)</a></li> <li>shinymodels <a href="https://shinymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.1)</a></li> <li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.4)</a></li> <li>tidyclust <a href="https://tidyclust.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.1)</a></li> <li>tidymodels <a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>tune <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>workflows <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.4)</a></li> <li>workflowsets <a href="https://workflowsets.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>yardstick <a href="https://yardstick.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.1)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: new prediction options in censored, consistency in augmenting parsnip models and workflows, as well as a new autoplot type for workflow sets.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/censored'>censored</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="new-prediction-options-in-censored">New prediction options in censored <a href="#new-prediction-options-in-censored"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As part of the framework-wide integration of survival analysis, the parsnip extension package censored has received some love in the form of new prediction options.</p> <p>Random forests with the <code>&quot;aorsf&quot;</code> engine can now predict survival time, thanks to the new feature in the <a href="https://docs.ropensci.org/aorsf/" target="_blank" rel="noopener">aorsf</a> package itself. This means that all engines in censored can now predict survival time.</p> <p>Let&rsquo;s predict survival time for the first five rows of the lung cancer dataset, survival analysis&rsquo; <code>mtcars</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rf_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"aorsf"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rf_fit</span> <span class='o'>&lt;-</span> <span class='nv'>rf_spec</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>age</span> <span class='o'>+</span> <span class='nv'>sex</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>lung_5</span> <span class='o'>&lt;-</span> <span class='nv'>lung</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>5</span>, <span class='o'>]</span></span> <span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>rf_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span> <span><span class='c'>#&gt; .pred_time</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 217.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 240.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 236.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 236.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 254.</span></span> <span></span></code></pre> </div> <p>Some models allow for predictions based on different values for tuning parameter without having to refit the model. In parsnip, we refer to this as <a href="https://parsnip.tidymodels.org/articles/Submodels.html" target="_blank" rel="noopener">&ldquo;the submodel trick.&quot;</a> Some of those models are regularized models fitted with the <a href="https://glmnet.stanford.edu/" target="_blank" rel="noopener">glmnet</a> engine. In censored, the corresponding <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a> method has now gained the prediction types <code>&quot;time&quot;</code> and <code>&quot;raw&quot;</code> in addition to the existing types <code>&quot;survival&quot;</code> and <code>&quot;linear_pred&quot;</code>.</p> <p>Let&rsquo;s fit a regularized Cox model to illustrate. Note how we set the <code>penalty</code> to a fixed value of <code>0.1</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cox_fit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/proportional_hazards.html'>proportional_hazards</a></span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span></code></pre> </div> <p>Predictions made with <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> use that penalty value of 0.1. With <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a>, we can change that value to something different without having to refit. Conveniently, we can predict for multiple penalty values as well.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span> <span><span class='c'>#&gt; .pred_time</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 425.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 350.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #BB0000;'>NA</span></span></span> <span></span><span></span> <span><span class='nv'>mpred</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/multi_predict.html'>multi_predict</a></span><span class='o'>(</span><span class='nv'>cox_fit</span>, new_data <span class='o'>=</span> <span class='nv'>lung_5</span>, type <span class='o'>=</span> <span class='s'>"time"</span>, </span> <span> penalty <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.01</span>, <span class='m'>0.1</span><span class='o'>)</span><span class='o'>)</span> </span> <span><span class='nv'>mpred</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span></span> <span><span class='c'>#&gt; .pred </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span></span></code></pre> </div> <p>The resulting tibble is nested by observation to follow the convention of one row per observation. For each observation, the predictions are stored in a tibble containing the penalty value along with the prediction.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mpred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; penalty .pred_time</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 0.01 461.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 0.1 425.</span></span> <span></span></code></pre> </div> <p>You can see that the predicted value from <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> matches the predicted value from <a href="https://parsnip.tidymodels.org/reference/multi_predict.html" target="_blank" rel="noopener"><code>multi_predict()</code></a> with a penalty of 0.1.</p> <h2 id="consistent-augment-for-workflows-and-parsnip-models">Consistent <code>augment()</code> for workflows and parsnip models <a href="#consistent-augment-for-workflows-and-parsnip-models"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you are interested in exploring predictions in relation to predictors, <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a> is your extended <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> method: it will augment the inputted dataset with its predictions. For classification, it will add hard class predictions as well as class probabilities. For regression, it will add the numeric prediction. If the outcome variable is part of the dataset, it also calculates residuals. This has already been the case for fitted parsnip models, and the <a href="https://generics.r-lib.org/reference/augment.html" target="_blank" rel="noopener"><code>augment()</code></a> method for workflows will now also calculate residuals.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>spec_fit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='nv'>wflow_fit</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>spec_fit</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 13</span></span></span> <span><span class='c'>#&gt; .pred .resid mpg cyl disp hp drat wt qsec vs am gear</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 22.6 -<span style='color: #BB0000;'>1.60</span> 21 6 160 110 3.9 2.62 16.5 0 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 22.1 -<span style='color: #BB0000;'>1.11</span> 21 6 160 110 3.9 2.88 17.0 0 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 26.3 -<span style='color: #BB0000;'>3.45</span> 22.8 4 108 93 3.85 2.32 18.6 1 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 21.2 0.163 21.4 6 258 110 3.08 3.22 19.4 1 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 17.7 1.01 18.7 8 360 175 3.15 3.44 17.0 0 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 20.4 -<span style='color: #BB0000;'>2.28</span> 18.1 6 225 105 2.76 3.46 20.2 1 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 14.4 -<span style='color: #BB0000;'>0.086</span><span style='color: #BB0000; text-decoration: underline;'>3</span> 14.3 8 360 245 3.21 3.57 15.8 0 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 22.5 1.90 24.4 4 147. 62 3.69 3.19 20 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 24.4 -<span style='color: #BB0000;'>1.62</span> 22.8 4 141. 95 3.92 3.15 22.9 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 18.7 0.501 19.2 6 168. 123 3.92 3.44 18.3 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: carb &lt;dbl&gt;</span></span></span> <span></span><span></span> <span><span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>wflow_fit</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 13</span></span></span> <span><span class='c'>#&gt; .pred .resid mpg cyl disp hp drat wt qsec vs am gear</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>*</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 22.6 -<span style='color: #BB0000;'>1.60</span> 21 6 160 110 3.9 2.62 16.5 0 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 22.1 -<span style='color: #BB0000;'>1.11</span> 21 6 160 110 3.9 2.88 17.0 0 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 26.3 -<span style='color: #BB0000;'>3.45</span> 22.8 4 108 93 3.85 2.32 18.6 1 1 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 21.2 0.163 21.4 6 258 110 3.08 3.22 19.4 1 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 17.7 1.01 18.7 8 360 175 3.15 3.44 17.0 0 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 20.4 -<span style='color: #BB0000;'>2.28</span> 18.1 6 225 105 2.76 3.46 20.2 1 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 14.4 -<span style='color: #BB0000;'>0.086</span><span style='color: #BB0000; text-decoration: underline;'>3</span> 14.3 8 360 245 3.21 3.57 15.8 0 0 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 22.5 1.90 24.4 4 147. 62 3.69 3.19 20 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 24.4 -<span style='color: #BB0000;'>1.62</span> 22.8 4 141. 95 3.92 3.15 22.9 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 18.7 0.501 19.2 6 168. 123 3.92 3.44 18.3 1 0 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: carb &lt;dbl&gt;</span></span></span> <span></span></code></pre> </div> <p>Both methods also append on the left-hand side of the data frame, rather than the right-hand side. This means that prediction columns are always visible when printed, even for data frames with many columns. As you might expect, the order of the columns is the same for both methods as well.</p> <h2 id="new-autoplot-type-for-workflow-sets">New autoplot type for workflow sets <a href="#new-autoplot-type-for-workflow-sets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many tidymodels objects have <a href="https://ggplot2.tidyverse.org/reference/autoplot.html" target="_blank" rel="noopener"><code>autoplot()</code></a> methods for quickly getting a sense of the most important aspects of an object. For workflow sets, the method shows the value of the calculated performance metrics, as well as the respective rank of each workflow in the set. Let&rsquo;s put together a workflow set on the actual <code>mtcars</code> data and take a look at the default autoplot.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mt_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='nv'>mt_rec2</span> <span class='o'>&lt;-</span> <span class='nv'>mt_rec</span> <span class='o'>|&gt;</span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>mt_rec3</span> <span class='o'>&lt;-</span> <span class='nv'>mt_rec</span> <span class='o'>|&gt;</span> <span class='nf'>step_YeoJohnson</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>wflow_set</span> <span class='o'>&lt;-</span> <span class='nf'>workflow_set</span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>plain <span class='o'>=</span> <span class='nv'>mt_rec</span>, normalize <span class='o'>=</span> <span class='nv'>mt_rec2</span>, yeo_johnson <span class='o'>=</span> <span class='nv'>mt_rec3</span><span class='o'>)</span>, </span> <span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='nv'>wflow_set_fit</span> <span class='o'>&lt;-</span> <span class='nf'>workflow_map</span><span class='o'>(</span></span> <span> <span class='nv'>wflow_set</span>, </span> <span> <span class='s'>"fit_resamples"</span>, </span> <span> resamples <span class='o'>=</span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>wflow_set_fit</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/workflowsets-autoplot-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This allows you to grasp the metric values and rank of a workflow and let&rsquo;s you distinguish the type of preprocessor and model. In our case, we only have one type of model, and even just one type of preprocessor, a recipe. What we are much more interested in is which recipe corresponds to which rank. The new option of <code>type = &quot;wflow_id&quot;</code> lets us see which values and ranks correspond with which workflow and thus also with which recipe.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>wflow_set_fit</span>, type <span class='o'>=</span> <span class='s'>"wflow_id"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/workflowsets-autoplot-new-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This makes it easy to spot that it&rsquo;s the Yeo-Johnson transformation that makes the difference here!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>baguette: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>brulee: <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>butcher: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</li> <li>censored: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>.</li> <li>dials: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>.</li> <li>finetune: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/mfansler" target="_blank" rel="noopener">@mfansler</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>hardhat: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>modeldata: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>parsnip: <a href="https://github.com/birbritto" target="_blank" rel="noopener">@birbritto</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmunyoon" target="_blank" rel="noopener">@jmunyoon</a>, <a href="https://github.com/marcelglueck" target="_blank" rel="noopener">@marcelglueck</a>, <a href="https://github.com/mattheaphy" target="_blank" rel="noopener">@mattheaphy</a>, <a href="https://github.com/mesdi" target="_blank" rel="noopener">@mesdi</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/wzbillings" target="_blank" rel="noopener">@wzbillings</a>.</li> <li>probably: <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>, <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a>, <a href="https://github.com/jgaeb" target="_blank" rel="noopener">@jgaeb</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>recipes: <a href="https://github.com/DemetriPananos" target="_blank" rel="noopener">@DemetriPananos</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jdonland" target="_blank" rel="noopener">@jdonland</a>, <a href="https://github.com/JiahuaQu" target="_blank" rel="noopener">@JiahuaQu</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/SantiagoD999" target="_blank" rel="noopener">@SantiagoD999</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/stufield" target="_blank" rel="noopener">@stufield</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>rsample: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/paulcbauer" target="_blank" rel="noopener">@paulcbauer</a>, <a href="https://github.com/StevenWallaert" target="_blank" rel="noopener">@StevenWallaert</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/ZWael" target="_blank" rel="noopener">@ZWael</a>.</li> <li>shinymodels: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>stacks: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>tidyclust: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/katieburak" target="_blank" rel="noopener">@katieburak</a>.</li> <li>tidymodels: <a href="https://github.com/jkylearmstrong" target="_blank" rel="noopener">@jkylearmstrong</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/nikosGeography" target="_blank" rel="noopener">@nikosGeography</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>tune: <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/joshuagi" target="_blank" rel="noopener">@joshuagi</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>, <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>, <a href="https://github.com/rfsaldanha" target="_blank" rel="noopener">@rfsaldanha</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/walkerjameschris" target="_blank" rel="noopener">@walkerjameschris</a>.</li> <li>workflows: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/mesdi" target="_blank" rel="noopener">@mesdi</a>, <a href="https://github.com/Milardkh" target="_blank" rel="noopener">@Milardkh</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>workflowsets: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>yardstick: <a href="https://github.com/asb2111" target="_blank" rel="noopener">@asb2111</a>, <a href="https://github.com/Dpananos" target="_blank" rel="noopener">@Dpananos</a>, <a href="https://github.com/EduMinsky" target="_blank" rel="noopener">@EduMinsky</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>.</li> </ul> </div> <p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p> tune 1.2.0 https://www.tidyverse.org/blog/2024/04/tune-1-2-0/ Thu, 18 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/tune-1-2-0/ <div class="highlight"> </div> <p>We&rsquo;re indubitably amped to announce the release of <a href="https://tune.tidymodels.org/" target="_blank" rel="noopener">tune</a> 1.2.0, a package for hyperparameter tuning in the <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels framework</a>.</p> <p>You can install it from CRAN, along with the rest of the core packages in tidymodels, using the tidymodels meta-package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidymodels"</span><span class='o'>)</span></span></code></pre> </div> <p>The 1.2.0 release of tune has introduced support for two major features that we&rsquo;ve written about on the tidyverse blog already:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/" target="_blank" rel="noopener">Survival analysis for time-to-event data with tidymodels</a></li> <li> <a href="https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/" target="_blank" rel="noopener">Fair machine learning with tidymodels</a></li> </ul> <p>While those features got their own blog posts, there are several more features in this release that we thought were worth calling out. This post will highlight improvements to our support for parallel processing, the introduction of support for percentile confidence intervals for performance metrics, and a few other bits and bobs. You can see a full list of changes in the <a href="https://github.com/tidymodels/tune/releases/tag/v1.2.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span></code></pre> </div> <p>Throughout this post, I&rsquo;ll refer to the example of tuning an XGBoost model to predict the fuel efficiency of various car models. I hear this is already a well-explored modeling problem, but alas:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>tune_grid</span><span class='o'>(</span></span> <span> <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>Note that we&rsquo;ve used the <a href="https://tune.tidymodels.org/reference/control_grid.html" target="_blank" rel="noopener">control option</a> <code>save_pred = TRUE</code> to indicate that we want to save the predictions from our resampled models in the tuning results. Both <code>int_pctl()</code> and <code>compute_metrics()</code> below will need those predictions. The metrics for our resampled model look like so:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span> <span><span class='c'>#&gt; mtry learn_rate .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 0.002<span style='text-decoration: underline;'>04</span> rmse standard 19.7 25 0.262 Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 0.002<span style='text-decoration: underline;'>04</span> rsq standard 0.659 25 0.031<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 6 0.008<span style='text-decoration: underline;'>59</span> rmse standard 18.0 25 0.260 Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 6 0.008<span style='text-decoration: underline;'>59</span> rsq standard 0.607 25 0.027<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 0.027<span style='text-decoration: underline;'>6</span> rmse standard 14.0 25 0.267 Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 0.027<span style='text-decoration: underline;'>6</span> rsq standard 0.710 25 0.023<span style='text-decoration: underline;'>7</span> Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span> <span></span></code></pre> </div> <h2 id="modernized-support-for-parallel-processing">Modernized support for parallel processing <a href="#modernized-support-for-parallel-processing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The tidymodels framework has long supported evaluating models in parallel using the <a href="https://cran.r-project.org/web/packages/foreach/vignettes/foreach.html" target="_blank" rel="noopener">foreach</a> package. This release of tune has introduced support for parallelism using the <a href="https://www.futureverse.org/" target="_blank" rel="noopener">futureverse</a> framework, and we will begin deprecating our support for foreach in a coming release.</p> <p>To tune a model in parallel with foreach, a user would load a <em>parallel backend</em> package (usually with a name like <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library(doBackend)</code></a>) and then <em>register</em> it with foreach (with a function call like <code>registerDoBackend()</code>). The tune package would then detect that registered backend and take it from there. For example, the code to distribute the above tuning process across 10 cores with foreach would look like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'>doMC</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/pkg/doMC/man/registerDoMC.html'>registerDoMC</a></span><span class='o'>(</span>cores <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>tune_grid</span><span class='o'>(</span></span> <span> <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>The code to do so with future is similarly simple. Users first load the <a href="https://future.futureverse.org/index.html" target="_blank" rel="noopener">future</a> package, and then specify a <a href="https://future.futureverse.org/reference/plan.html" target="_blank" rel="noopener"><code>plan()</code></a> which dictates how computations will be distributed. For example, the code to distribute the above tuning process across 10 cores with future looks like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://future.futureverse.org'>future</a></span><span class='o'>)</span></span> <span><span class='nf'><a href='https://future.futureverse.org/reference/plan.html'>plan</a></span><span class='o'>(</span><span class='nv'>multisession</span>, workers <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>tune_grid</span><span class='o'>(</span></span> <span> <span class='nf'>boost_tree</span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span>, mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'>control_grid</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>For users, the transition to parallelism with future has several benefits:</p> <ul> <li>The futureverse presently supports a greater number of parallelism technologies and has been more likely to receive implementations for new ones.</li> <li>Once foreach is fully deprecated, users will be able to use the <a href="https://www.tidyverse.org/blog/2023/04/tuning-delights/#interactive-issue-logging" target="_blank" rel="noopener">interactive logger</a> when tuning in parallel.</li> </ul> <p>From our perspective, transitioning our parallelism support to future makes our packages much more maintainable, reducing complexity in random number generation, error handling, and progress reporting.</p> <p>In an upcoming release of the package, you&rsquo;ll see a deprecation warning when a foreach parallel backend is registered but no future plan has been specified, so start transitioning your code sooner than later!</p> <h2 id="percentile-confidence-intervals">Percentile confidence intervals <a href="#percentile-confidence-intervals"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Following up on changes in the <a href="https://github.com/tidymodels/rsample/releases/tag/v1.2.0" target="_blank" rel="noopener">most recent rsample release</a>, tune has introduced a <a href="https://tune.tidymodels.org/reference/int_pctl.tune_results.html" target="_blank" rel="noopener">method for <code>int_pctl()</code></a> that calculates percentile confidence intervals for performance metrics. To calculate a 90% confidence interval for the values of each performance metric returned in <code>collect_metrics()</code>, we&rsquo;d write:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2024</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>int_pctl</span><span class='o'>(</span><span class='nv'>xgb_res</span>, alpha <span class='o'>=</span> <span class='m'>.1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span> <span><span class='c'>#&gt; .metric .estimator .lower .estimate .upper .config mtry learn_rate</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> rmse bootstrap 18.1 19.9 22.0 Preprocessor1_Mod… 2 0.002<span style='text-decoration: underline;'>04</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> rsq bootstrap 0.570 0.679 0.778 Preprocessor1_Mod… 2 0.002<span style='text-decoration: underline;'>04</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> rmse bootstrap 16.6 18.3 19.9 Preprocessor1_Mod… 6 0.008<span style='text-decoration: underline;'>59</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> rsq bootstrap 0.548 0.665 0.765 Preprocessor1_Mod… 6 0.008<span style='text-decoration: underline;'>59</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> rmse bootstrap 12.5 14.1 15.9 Preprocessor1_Mod… 3 0.027<span style='text-decoration: underline;'>6</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> rsq bootstrap 0.622 0.720 0.818 Preprocessor1_Mod… 3 0.027<span style='text-decoration: underline;'>6</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span> <span></span></code></pre> </div> <p>Note that the output has the same number of rows as the <code>collect_metrics()</code> output: one for each unique pair of metric and workflow.</p> <p>This is very helpful for validation sets. Other resampling methods generate replicated performance statistics. We can compute simple interval estimates using the mean and standard error for those. Validation sets produce only one estimate, and these bootstrap methods are probably the best option for obtaining interval estimates.</p> <h2 id="breaking-change-relocation-of-ellipses">Breaking change: relocation of ellipses <a href="#breaking-change-relocation-of-ellipses"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve made a <strong>breaking change</strong> in argument order for several functions in the package (and downstream packages like finetune and workflowsets). Ellipses (&hellip;) are now used consistently in the package to require optional arguments to be named. For functions that previously had unused ellipses at the end of the function signature, they have been moved to follow the last argument without a default value, and several other functions that previously did not have ellipses in their signatures gained them. This applies to methods for <code>augment()</code>, <code>collect_predictions()</code>, <code>collect_metrics()</code>, <code>select_best()</code>, <code>show_best()</code>, and <code>conf_mat_resampled()</code>.</p> <h2 id="compute-new-metrics-without-re-fitting">Compute new metrics without re-fitting <a href="#compute-new-metrics-without-re-fitting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve also added a new function, <a href="https://tune.tidymodels.org/reference/compute_metrics.html" target="_blank" rel="noopener"><code>compute_metrics()</code></a>, that allows for calculating metrics that were not used when evaluating against resamples. For example, consider our <code>xgb_res</code> object. Since we didn&rsquo;t supply any metrics to evaluate, and this model is a regression model, tidymodels selected RMSE and R<sup>2</sup> as defaults:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span> <span><span class='c'>#&gt; mtry learn_rate .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 0.002<span style='text-decoration: underline;'>04</span> rmse standard 19.7 25 0.262 Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 0.002<span style='text-decoration: underline;'>04</span> rsq standard 0.659 25 0.031<span style='text-decoration: underline;'>4</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 6 0.008<span style='text-decoration: underline;'>59</span> rmse standard 18.0 25 0.260 Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 6 0.008<span style='text-decoration: underline;'>59</span> rsq standard 0.607 25 0.027<span style='text-decoration: underline;'>0</span> Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 0.027<span style='text-decoration: underline;'>6</span> rmse standard 14.0 25 0.267 Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 0.027<span style='text-decoration: underline;'>6</span> rsq standard 0.710 25 0.023<span style='text-decoration: underline;'>7</span> Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span> <span></span></code></pre> </div> <p>In the past, if you wanted to evaluate that workflow against a performance metric that you hadn&rsquo;t included in your <code>tune_grid()</code> run, you&rsquo;d need to re-run <code>tune_grid()</code>, fitting models and predicting new values all over again. Now, using the <code>compute_metrics()</code> function, you can use the <code>tune_grid()</code> output you&rsquo;ve already generated and compute any number of new metrics without having to fit any more models as long as you use the control option <code>save_pred = TRUE</code> when tuning.</p> <p>So, say I want to additionally calculate Huber Loss and Mean Absolute Percent Error. I just pass those metrics along with the tuning result to <code>compute_metrics()</code>, and the result looks just like <code>collect_metrics()</code> output for the metrics originally calculated:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>compute_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span>, <span class='nf'>metric_set</span><span class='o'>(</span><span class='nv'>huber_loss</span>, <span class='nv'>mape</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 8</span></span></span> <span><span class='c'>#&gt; mtry learn_rate .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 0.002<span style='text-decoration: underline;'>04</span> huber_loss standard 18.3 25 0.232 Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 0.002<span style='text-decoration: underline;'>04</span> mape standard 94.4 25 0.068<span style='text-decoration: underline;'>5</span> Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 6 0.008<span style='text-decoration: underline;'>59</span> huber_loss standard 16.7 25 0.229 Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 6 0.008<span style='text-decoration: underline;'>59</span> mape standard 85.7 25 0.178 Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 0.027<span style='text-decoration: underline;'>6</span> huber_loss standard 12.6 25 0.230 Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 0.027<span style='text-decoration: underline;'>6</span> mape standard 64.4 25 0.435 Preprocessor1_Mode…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 14 more rows</span></span></span> <span></span></code></pre> </div> <h2 id="easily-pivot-resampled-metrics">Easily pivot resampled metrics <a href="#easily-pivot-resampled-metrics"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Finally, the <code>collect_metrics()</code> method for tune results recently <a href="https://tune.tidymodels.org/reference/collect_predictions.html#arguments" target="_blank" rel="noopener">gained a new argument</a>, <code>type</code>, indicating the shape of the returned metrics. The default, <code>type = &quot;long&quot;</code>, is the same shape as before. The argument value <code>type = &quot;wide&quot;</code> will allot each metric its own column, making it easier to compare metrics across different models.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>xgb_res</span>, type <span class='o'>=</span> <span class='s'>"wide"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10 × 5</span></span></span> <span><span class='c'>#&gt; mtry learn_rate .config rmse rsq</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 0.002<span style='text-decoration: underline;'>04</span> Preprocessor1_Model01 19.7 0.659</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 6 0.008<span style='text-decoration: underline;'>59</span> Preprocessor1_Model02 18.0 0.607</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 0.027<span style='text-decoration: underline;'>6</span> Preprocessor1_Model03 14.0 0.710</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 0.037<span style='text-decoration: underline;'>1</span> Preprocessor1_Model04 12.3 0.728</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 5 0.005<span style='text-decoration: underline;'>39</span> Preprocessor1_Model05 18.8 0.595</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 9 0.011<span style='text-decoration: underline;'>0</span> Preprocessor1_Model06 17.4 0.577</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4 more rows</span></span></span> <span></span></code></pre> </div> <p>Under the hood, this is indeed just a <code>pivot_wider()</code> call. We&rsquo;ve found that it&rsquo;s time-consuming and error-prone to programmatically determine identifying columns when pivoting resampled metrics, so we&rsquo;ve localized and thoroughly tested the code that we use to do so with this feature.</p> <h2 id="more-love-for-the-brier-score">More love for the Brier score <a href="#more-love-for-the-brier-score"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Tuning and resampling functions use default metrics when the user does not specify a custom metric set. For regression models, these are RMSE and R<sup>2</sup>. For classification, accuracy and the area under the ROC curve <em>were</em> the default. We&rsquo;ve also added the <a href="https://en.wikipedia.org/wiki/Brier_score" target="_blank" rel="noopener">Brier score</a> to the default classification metric list.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As always, we&rsquo;re appreciative of the community contributors who helped make this release happen: <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>, <a href="https://github.com/dramanica" target="_blank" rel="noopener">@dramanica</a>, <a href="https://github.com/epiheather" target="_blank" rel="noopener">@epiheather</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/kbodwin" target="_blank" rel="noopener">@kbodwin</a>, <a href="https://github.com/kenraywilliams" target="_blank" rel="noopener">@kenraywilliams</a>, <a href="https://github.com/KJT-Habitat" target="_blank" rel="noopener">@KJT-Habitat</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, and <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>.</p> <div class="highlight"> </div> Tidyverse developer day 2024 https://www.tidyverse.org/blog/2024/04/tdd-2024/ Tue, 09 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/tdd-2024/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) --> <p>It&rsquo;s been a hot minute since the last one, but we are very excited to announce that the next tidyverse developer day will be after <a href="https://posit.co/conference/" target="_blank" rel="noopener">posit::conf</a> in Seattle on August 15, 2024. A big thanks goes to <a href="https://www.fredhutch.org/en.html" target="_blank" rel="noopener">Fred Hutch Cancer Center</a> for donating the space!</p> <p><strong>What is the tidyverse developer day?</strong> TDD is a day of learning and coding to nurture regular contributors to the tidyverse. We&rsquo;ll provide food; you&rsquo;ll bring your laptop and enthusiasm. The tidyverse team and other community helpers will be on hand to help you hit the ground running and/or get over any stumbling blocks that you encounter. Don&rsquo;t have any ideas for something to work on? No problem! We&rsquo;ll be tagging issues in advance to make sure there&rsquo;s lots to do for any- and everyone, regardless of level of expertise.</p> <p><strong>Who should attend?</strong> Anyone who would like to get better at contributing to the tidyverse! Everyone is welcome regardless of whether you&rsquo;ve never done a PR before, or if you&rsquo;ve already made your 10th package. But you do need a ticket; to provide a fulfilling experience for all attendees we need to carefully manage the ratio of attendees to helpers.</p> <p><strong>How much does it cost?</strong> $10. This doesn&rsquo;t cover the costs of the event because we don&rsquo;t want to make attendance contingent on your ability to pay, but we&rsquo;ve found some monetary commitment discourages people from taking tickets that they don&rsquo;t end up using. But if the cost would prevent you from attending, please email <a href="mailto:[email protected]">[email protected]</a> and we can figure something out.</p> <p> <a href="https://www.eventbrite.com/e/tidyverse-developer-day-2024-tickets-876018203027?aff=oddtdtcreator" target="_blank" rel="noopener">Buy your ticket now!</a></p> dbplyr 2.5.0 https://www.tidyverse.org/blog/2024/04/dbplyr-2-5-0/ Mon, 08 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/dbplyr-2-5-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re most pleased to announce the release of <a href="http://dbplyr.tidyverse.org/" target="_blank" rel="noopener">dbplyr</a> 2.5.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dbplyr"</span><span class='o'>)</span></span></code></pre> </div> <p>This post focuses on the biggest change in dbplyr 2.5.0: improved syntax for tables nested inside schema and catalogs. As usual, this release also contains a ton of minor improvements to SQL generation, and I&rsquo;d highly recommend skimming the <a href="https://github.com/tidyverse/dbplyr/releases/tag/v2.5.0" target="_blank" rel="noopener">release notes</a> to learn the details.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbplyr.tidyverse.org/'>dbplyr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="referring-to-tables-in-a-schema">Referring to tables in a schema <a href="#referring-to-tables-in-a-schema"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Historically, dbplyr has provided a bewildering array of options to specify a table inside a schema inside a catalog:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/ident_q.html'>ident_q</a></span><span class='o'>(</span><span class='s'>"catalog_name.schema_name.table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/sql.html'>sql</a></span><span class='o'>(</span><span class='s'>"SELECT * FROM catalog_name.schema_name.table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/in_schema.html'>in_catalog</a></span><span class='o'>(</span><span class='s'>"catalog_name"</span>, <span class='s'>"schema_name"</span>, <span class='s'>"table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/ident_q.html'>ident_q</a></span><span class='o'>(</span><span class='s'>"catalog_name.schema_name"</span><span class='o'>)</span>, <span class='s'>"table_name"</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/sql.html'>sql</a></span><span class='o'>(</span><span class='s'>"catalog_name.schema_name"</span><span class='o'>)</span>, <span class='s'>"table_name"</span><span class='o'>)</span></span></code></pre> </div> <p>You can also use <a href="https://dbi.r-dbi.org/reference/Id.html" target="_blank" rel="noopener"><code>DBI::Id()</code></a>, whose syntax has also evolved over time:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'>DBI</span><span class='nf'>::</span><span class='nf'><a href='https://dbi.r-dbi.org/reference/Id.html'>Id</a></span><span class='o'>(</span>database <span class='o'>=</span> <span class='s'>"catalog_name"</span>, schema <span class='o'>=</span> <span class='s'>"schema_name"</span>, table <span class='o'>=</span> <span class='s'>"table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'>DBI</span><span class='nf'>::</span><span class='nf'><a href='https://dbi.r-dbi.org/reference/Id.html'>Id</a></span><span class='o'>(</span>catalog <span class='o'>=</span> <span class='s'>"catalog_name"</span>, schema <span class='o'>=</span> <span class='s'>"schema_name"</span>, table <span class='o'>=</span> <span class='s'>"table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'>DBI</span><span class='nf'>::</span><span class='nf'><a href='https://dbi.r-dbi.org/reference/Id.html'>Id</a></span><span class='o'>(</span><span class='s'>"catalog_name"</span>, <span class='s'>"schema_name"</span>, <span class='s'>"table_name"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>Many of these options were poorly supported (i.e. we would accidentally break them from time-to-time) and suffered from the lack of a holistic vision. This release aims to bring order to the chaos by providing a succinct new syntax for literal table identifiers: <a href="https://rdrr.io/r/base/AsIs.html" target="_blank" rel="noopener"><code>I()</code></a>. This allows you to succinctly identify a table nested inside a schema or catalog:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='s'>"catalog_name.schema_name.table_name"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>con</span> <span class='o'>|&gt;</span> <span class='nf'>tbl</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='s'>"schema_name.table_name"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p> <a href="https://rdrr.io/r/base/AsIs.html" target="_blank" rel="noopener"><code>I()</code></a> is a base function, and you may be familiar with it from modelling, e.g. <code>lm(y ~ x + I(y * z))</code>. It performs a similar role for both dbplyr and modelling function: it tells the function to treat the argument as is, rather than quoting it in the case of dbplyr, or interpreting as an interaction in the case of <a href="https://rdrr.io/r/stats/lm.html" target="_blank" rel="noopener"><code>lm()</code></a>.</p> <p> <a href="https://rdrr.io/r/base/AsIs.html" target="_blank" rel="noopener"><code>I()</code></a> is dbplyr&rsquo;s preferred way of specifying nested table identifiers and we will eventually formally supersede and then one day deprecate the other options. However, because their usage is widespread, this process will be slow and gradual, and play out over multiple years; there&rsquo;s no need to make changes now.</p> <p>(If you&rsquo;re the author of a dbplyr backend, you&rsquo;ll can take advantage of this new syntax by using the <code>dbplyr_table_path</code> class. dbplyr now provides a <a href="https://dbplyr.tidyverse.org/reference/is_table_path.html" target="_blank" rel="noopener">few helper functions</a> to make this easier.)</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 46 folks who helped to make this release possible with their thoughtful comments and code contributions! <a href="https://github.com/aarizvi" target="_blank" rel="noopener">@aarizvi</a>, <a href="https://github.com/abalter" target="_blank" rel="noopener">@abalter</a>, <a href="https://github.com/andreassoteriadesmoj" target="_blank" rel="noopener">@andreassoteriadesmoj</a>, <a href="https://github.com/andrew-schulman" target="_blank" rel="noopener">@andrew-schulman</a>, <a href="https://github.com/apalacio9502" target="_blank" rel="noopener">@apalacio9502</a>, <a href="https://github.com/carlinstarrs" target="_blank" rel="noopener">@carlinstarrs</a>, <a href="https://github.com/catalamarti" target="_blank" rel="noopener">@catalamarti</a>, <a href="https://github.com/chicotobi" target="_blank" rel="noopener">@chicotobi</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dmenne" target="_blank" rel="noopener">@dmenne</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/edonnachie" target="_blank" rel="noopener">@edonnachie</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/ejneer" target="_blank" rel="noopener">@ejneer</a>, <a href="https://github.com/erydit" target="_blank" rel="noopener">@erydit</a>, <a href="https://github.com/espinielli" target="_blank" rel="noopener">@espinielli</a>, <a href="https://github.com/fh-afrachioni" target="_blank" rel="noopener">@fh-afrachioni</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/godislobster" target="_blank" rel="noopener">@godislobster</a>, <a href="https://github.com/gorcha" target="_blank" rel="noopener">@gorcha</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hild0146" target="_blank" rel="noopener">@hild0146</a>, <a href="https://github.com/JakeHurlbut" target="_blank" rel="noopener">@JakeHurlbut</a>, <a href="https://github.com/jarodmeng" target="_blank" rel="noopener">@jarodmeng</a>, <a href="https://github.com/Jiefei-Wang" target="_blank" rel="noopener">@Jiefei-Wang</a>, <a href="https://github.com/joshbal" target="_blank" rel="noopener">@joshbal</a>, <a href="https://github.com/kelseyroberts" target="_blank" rel="noopener">@kelseyroberts</a>, <a href="https://github.com/kmishra9" target="_blank" rel="noopener">@kmishra9</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/m-muecke" target="_blank" rel="noopener">@m-muecke</a>, <a href="https://github.com/maciekbanas" target="_blank" rel="noopener">@maciekbanas</a>, <a href="https://github.com/marcusmunch" target="_blank" rel="noopener">@marcusmunch</a>, <a href="https://github.com/mgarbuzov" target="_blank" rel="noopener">@mgarbuzov</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/misea" target="_blank" rel="noopener">@misea</a>, <a href="https://github.com/MKatz-DHSC" target="_blank" rel="noopener">@MKatz-DHSC</a>, <a href="https://github.com/Mkranj" target="_blank" rel="noopener">@Mkranj</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/nathanhaigh" target="_blank" rel="noopener">@nathanhaigh</a>, <a href="https://github.com/nilescbn" target="_blank" rel="noopener">@nilescbn</a>, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>, <a href="https://github.com/Tazinho" target="_blank" rel="noopener">@Tazinho</a>, <a href="https://github.com/thomashulst" target="_blank" rel="noopener">@thomashulst</a>, <a href="https://github.com/Thranholm" target="_blank" rel="noopener">@Thranholm</a>, <a href="https://github.com/tomshafer" target="_blank" rel="noopener">@tomshafer</a>, and <a href="https://github.com/wstvcg" target="_blank" rel="noopener">@wstvcg</a>.</p> Survival analysis for time-to-event data with tidymodels https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/ Wed, 03 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/tidymodels-survival-analysis/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the support of survival analysis for time-to-event data across tidymodels. The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles. This new support makes survival analysis a first-class citizen in tidymodels and gives censored regression modeling the same flexibility and ease as classification or regression.</p> <p>The functionality resides in multiple tidymodels packages. The easiest way to install them all is to install the tidymodels meta-package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidymodels"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will highlight why this is useful, explain which additions we&rsquo;ve made to the framework, and point to several places to learn more.</p> <p>You can see a full list of changes in the release notes:</p> <ul> <li> <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-120" target="_blank" rel="noopener">parsnip</a></li> <li> <a href="https://censored.tidymodels.org/news/index.html#censored-030" target="_blank" rel="noopener">censored</a></li> <li> <a href="https://yardstick.tidymodels.org/news/index.html#yardstick-130" target="_blank" rel="noopener">yardstick</a></li> <li> <a href="https://workflows.tidymodels.org/news/index.html#workflows-114" target="_blank" rel="noopener">workflows</a></li> <li> <a href="https://tune.tidymodels.org/news/index.html#tune-120" target="_blank" rel="noopener">tune</a></li> <li> <a href="https://finetune.tidymodels.org/news/index.html#finetune-120" target="_blank" rel="noopener">finetune</a></li> <li> <a href="https://workflowsets.tidymodels.org/news/index.html#workflowsets-110" target="_blank" rel="noopener">workflowsets</a></li> </ul> <h2 id="increasing-usefulness-two-perspectives">Increasing usefulness: Two perspectives <a href="#increasing-usefulness-two-perspectives"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to situate the changes from two different perspectives: How this is useful for people already familiar with survival analysis as well as for people already familiar with tidymodels.</p> <p>If you are already familiar with both: Excellent, this is very much for you! Read on for more details on how these two things come together.</p> <h3 id="adding-tidymodels-to-your-tool-kit">Adding tidymodels to your tool kit <a href="#adding-tidymodels-to-your-tool-kit"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you are already familiar with survival analysis but maybe not tidymodels, these changes now unlock a whole framework for predictive modelling for you. It applies tidyverse principles to modeling, meaning it strives to be consistent, composable, and human-centered. The framework covers the modeling process from the initial test/train split of the data all the way to tuning various models. Along the way it offers a rich selection of preprocessing techniques, resampling schemes, and performance metrics along with safe-guards against accidental overfitting. We make the full case for tidymodels at <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels.org</a>.</p> <h3 id="adding-survival-analysis-to-your-tool-kit">Adding survival analysis to your tool kit <a href="#adding-survival-analysis-to-your-tool-kit"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you are already familiar with tidymodels but maybe not survival analysis, these changes let you leverage the familiar framework for an additional type of modeling problem. Survival analysis offers methods for modeling time-to-event data. While it has its roots in medical research, it has broad applications as that event of interest can be so much more than a medical outcome. Take customer churn as an example: We are interested in how long someone is a customer for and when they churn. For customers who churned, we have the complete time for which they were customers. For existing customers, we only know how long they&rsquo;ve been customers for <em>so far</em>. Such observations are called censored. So what are our modeling choices here?</p> <p>We could look at the time and model that as a regression problem. We could look at the event status and model that as a classification problem. Both options might get us somewhere close to an answer to our original modeling question but not quite there. Censored regression models let us model an outcome that includes both aspects, the time and the event status. And with that, it can deal with both censored and uncensored observations appropriately. With this type of model, we can predict the survival time, or in more applied terms, how long someone will stay as a customer. We can also predict the probability of survival at a given time point. This lets us answer questions like &ldquo;How likely is it that this customer will churn after 3 months?&quot;. See which prediction types are available for which models at <a href="https://censored.tidymodels.org/" target="_blank" rel="noopener">censored.tidymodels.org</a>.</p> <h2 id="ch-ch-changes-whats-new-for-censored-regression">Ch-ch-changes: What&rsquo;s new for censored regression? <a href="#ch-ch-changes-whats-new-for-censored-regression"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The main components needed for this full-fledged integration of survival analysis into tidymodels were</p> <ul> <li>Survival analysis models that can take censoring into account</li> <li>Survival analysis performance metrics that can take censoring into account</li> <li>Integrating changes required by these models and metrics into the framework</li> </ul> <p>For the models, parsnip gained a new mode, <code>&quot;censored regression&quot;</code>, for existing models as well as new model types such as <code>proportional_hazards()</code>. Engines for these reside in censored, the parsnip extension package for survival models. The <code>&quot;censored regression&quot;</code> mode has been around for a while and we&rsquo;ve previously shared posts on <a href="https://www.tidyverse.org/blog/2021/11/survival-analysis-parsnip-adjacent/" target="_blank" rel="noopener">our initial thoughts</a> and the <a href="https://www.tidyverse.org/blog/2022/08/censored-0-1-0/" target="_blank" rel="noopener">release of censored</a>.</p> <p>Now we&rsquo;ve added the metrics: <a href="https://yardstick.tidymodels.org/news/index.html#yardstick-130" target="_blank" rel="noopener">yardstick v1.3.0</a> includes new metrics for assessing censored regression models. Somewhat similar to how metrics for classification models can take class predictions or probability predictions as input, these survival metrics can take predicted survival times or predictions of survival probabilities as input.</p> <p>The new metrics are</p> <ul> <li>Concordance index on the survival time via <code>concordance_survival()</code></li> <li>Brier score on the survival probability and its integrated version via <code>brier_survival()</code> and <code>brier_survival_integrated()</code></li> <li>ROC curve and the area under the ROC curve on the survival probabilities via <code>roc_curve_survival()</code> and <code>auc_roc_survival()</code> respectively</li> </ul> <p>The probability of survival is always defined <em>at a certain point in time</em>. We call that time point the <em>evaluation time</em> because it is then also the time point at which we want to evaluate model performance. Metrics that work on the survival probabilities are also called <em>dynamic metrics</em> and you can read more about them here:</p> <ul> <li> <a href="https://www.tidymodels.org/learn/statistics/survival-metrics/" target="_blank" rel="noopener">Dynamic Performance Metrics for Event Time Data</a></li> <li> <a href="https://www.tidymodels.org/learn/statistics/survival-metrics-details/" target="_blank" rel="noopener">Accounting for Censoring in Performance Metrics for Event Time Data</a></li> </ul> <p>The evaluation time is also the best example to illustrate the changes necessary to the framework. Most of them were under the hood but the evaluation time is user-facing. Let&rsquo;s take a look at that.</p> <p>While the need for evaluation times is dependent on type of metric, it is not actually specified as an argument to the metric functions. Like yardstick&rsquo;s other metrics, those take pre-made predictions as the input. So where do you specify it then?</p> <ul> <li>You need to specify it to directly predict survival probabilities, via <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> or <code>augment()</code>. We introduced the corresponding <code>eval_time</code> argument first for fitted models in <a href="https://www.tidyverse.org/blog/2023/04/censored-0-2-0/#introducing-eval_time" target="_blank" rel="noopener">parsnip and censored</a> and have added it now for workflows.</li> <li>You also need to specify it for the tuning functions <code>tune_*()</code> from tune and finetune as they will predict survival probabilities as part of the tuning process.</li> <li>Lastly, the <code>eval_time</code> argument now shows up when working with tuning/resampling results such as in <code>show_best()</code> or <code>autoplot()</code>. Those changes span the packages generating and working with resampling results: tune, finetune, and workflowsets.</li> </ul> <p>As we said, plenty of changes under the hood but you shouldn&rsquo;t need to notice them. Everything else should work &ldquo;as usual,&rdquo; allowing the same ease and flexibility in combining tidymodels functionality for censored regression as for classification and regression.</p> <h2 id="the-pieces-come-together-a-case-study">The pieces come together: A case study <a href="#the-pieces-come-together-a-case-study"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To see it all in action, check out the case study <a href="https://www.tidymodels.org/learn/statistics/survival-case-study/" target="_blank" rel="noopener">How long until building complaints are dispositioned?</a> on the tidymodels website!</p> <p>The city of New York publishes data on complaints received by the Department of Buildings that include how long it takes for a complaint to be dealt with (&ldquo;dispositioned&rdquo;) as well as several characteristics of the complaint. The case study covers a full analysis. We start with splitting the data into test and training sets, explore different preprocessing strategies and model types via tuning, and predict with a final model. It should give you a good first impression of how to use tidymodels for predictive survival analysis.</p> <p>We hope you&rsquo;ll find this new capability of tidymodels useful!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many thanks to the people who contributed to our packages since their last release:</p> <p><strong>parsnip:</strong> <a href="https://github.com/AlbanOtt2" target="_blank" rel="noopener">@AlbanOtt2</a>, <a href="https://github.com/birbritto" target="_blank" rel="noopener">@birbritto</a>, <a href="https://github.com/christophscheuch" target="_blank" rel="noopener">@christophscheuch</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/Freestyleyang" target="_blank" rel="noopener">@Freestyleyang</a>, <a href="https://github.com/gmcmacran" target="_blank" rel="noopener">@gmcmacran</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmunyoon" target="_blank" rel="noopener">@jmunyoon</a>, <a href="https://github.com/joscani" target="_blank" rel="noopener">@joscani</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/marcelglueck" target="_blank" rel="noopener">@marcelglueck</a>, <a href="https://github.com/mattheaphy" target="_blank" rel="noopener">@mattheaphy</a>, <a href="https://github.com/mesdi" target="_blank" rel="noopener">@mesdi</a>, <a href="https://github.com/millermc38" target="_blank" rel="noopener">@millermc38</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/seb-mueller" target="_blank" rel="noopener">@seb-mueller</a>, <a href="https://github.com/SHo-JANG" target="_blank" rel="noopener">@SHo-JANG</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/vidarsumo" target="_blank" rel="noopener">@vidarsumo</a>, and <a href="https://github.com/wzbillings" target="_blank" rel="noopener">@wzbillings</a>.</p> <p><strong>censored:</strong> <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/noahtsao" target="_blank" rel="noopener">@noahtsao</a>, and <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>.</p> <p><strong>yardstick:</strong> <a href="https://github.com/aecoleman" target="_blank" rel="noopener">@aecoleman</a>, <a href="https://github.com/asb2111" target="_blank" rel="noopener">@asb2111</a>, <a href="https://github.com/atsyplenkov" target="_blank" rel="noopener">@atsyplenkov</a>, <a href="https://github.com/bgreenwell" target="_blank" rel="noopener">@bgreenwell</a>, <a href="https://github.com/Dpananos" target="_blank" rel="noopener">@Dpananos</a>, <a href="https://github.com/EduMinsky" target="_blank" rel="noopener">@EduMinsky</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/heidekrueger" target="_blank" rel="noopener">@heidekrueger</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/iacrowe" target="_blank" rel="noopener">@iacrowe</a>, <a href="https://github.com/jarbet" target="_blank" rel="noopener">@jarbet</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/maxwell-geospatial" target="_blank" rel="noopener">@maxwell-geospatial</a>, <a href="https://github.com/moloscripts" target="_blank" rel="noopener">@moloscripts</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/ruddnr" target="_blank" rel="noopener">@ruddnr</a>, <a href="https://github.com/SimonCoulombe" target="_blank" rel="noopener">@SimonCoulombe</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/tbrittoborges" target="_blank" rel="noopener">@tbrittoborges</a>, <a href="https://github.com/tonyelhabr" target="_blank" rel="noopener">@tonyelhabr</a>, <a href="https://github.com/tripartio" target="_blank" rel="noopener">@tripartio</a>, <a href="https://github.com/TSI-PTG" target="_blank" rel="noopener">@TSI-PTG</a>, <a href="https://github.com/vnijs" target="_blank" rel="noopener">@vnijs</a>, <a href="https://github.com/wbuchanan" target="_blank" rel="noopener">@wbuchanan</a>, and <a href="https://github.com/zkrog" target="_blank" rel="noopener">@zkrog</a>.</p> <p><strong>workflows:</strong> <a href="https://github.com/Milardkh" target="_blank" rel="noopener">@Milardkh</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> <p><strong>tune:</strong> <a href="https://github.com/AlbertoImg" target="_blank" rel="noopener">@AlbertoImg</a>, <a href="https://github.com/dramanica" target="_blank" rel="noopener">@dramanica</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/epiheather" target="_blank" rel="noopener">@epiheather</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/kbodwin" target="_blank" rel="noopener">@kbodwin</a>, <a href="https://github.com/kenraywilliams" target="_blank" rel="noopener">@kenraywilliams</a>, <a href="https://github.com/KJT-Habitat" target="_blank" rel="noopener">@KJT-Habitat</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/marcozanotti" target="_blank" rel="noopener">@marcozanotti</a>, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, <a href="https://github.com/Peter4801" target="_blank" rel="noopener">@Peter4801</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/walkerjameschris" target="_blank" rel="noopener">@walkerjameschris</a>.</p> <p><strong>finetune:</strong> <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jdberson" target="_blank" rel="noopener">@jdberson</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/mfansler" target="_blank" rel="noopener">@mfansler</a>, <a href="https://github.com/ruddnr" target="_blank" rel="noopener">@ruddnr</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> <p><strong>workflowsets:</strong> <a href="https://github.com/dchiu911" target="_blank" rel="noopener">@dchiu911</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jkylearmstrong" target="_blank" rel="noopener">@jkylearmstrong</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</p> webR 0.3.1 https://www.tidyverse.org/blog/2024/04/webr-0-3-1/ Tue, 02 Apr 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/04/webr-0-3-1/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) * [x] Update all 0.3.0-rc0 references to 0.3.1 --> <!-- Initialise webR in the page --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.css"> <style> .CodeMirror pre { background-color: unset !important; } .btn-webr { background-color: #EEEEEE; border-bottom-left-radius: 0; border-bottom-right-radius: 0; } </style> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/mode/r/r.js"></script> <script type="module"> import { WebR } from 'https://webr.r-wasm.org/v0.4.2/webr.mjs'; globalThis.webR = new WebR(); await globalThis.webR.init(); await webR.FS.mkdir('/persist'); await webR.FS.mount('IDBFS', {}, '/persist'); await webR.FS.syncfs(true); await webR.evalRVoid("webr::shim_install()"); await webR.evalRVoid("webr::global_prompt_install()", { withHandlers: false }); globalThis.webRCodeShelter = await new globalThis.webR.Shelter(); document.querySelectorAll(".btn-webr").forEach((btn) => { btn.innerText = 'Run code'; btn.disabled = false; }); </script> <!-- Add webr engine for knit --> <div class="highlight"> </div> <!-- Custom styles for output --> <div class="highlight"> <style type="text/css"> .output > pre, .output code { background-color: #ffffff !important; margin-top: -17px; border-top-left-radius: 0px; border-top-right-radius: 0px; } .error > pre, .error code { background-color: #fcebeb !important; color: #410E0E !important; } </style> </div> <p>We&rsquo;re delighted to announce the release of <a href="https://docs.r-wasm.org/webr/latest/" target="_blank" rel="noopener">webR</a> 0.3.1. This release brings bug fixes, infrastructure upgrades, and exciting improvements to webR&rsquo;s API for creating R objects and evaluating R code from JavaScript. These new features make integrating webR with existing JavaScript frameworks such as <a href="https://observablehq.com" target="_blank" rel="noopener">Observable</a> a breeze.</p> <p>You can install the latest release from <a href="https://www.npmjs.com/package/webr" target="_blank" rel="noopener">npm</a> with the command:</p> <pre><code>npm i [email protected] </code></pre> <p>or if you&rsquo;re using JavaScript modules to import webR directly from CDN:</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">WebR</span> <span class="p">}</span> <span class="nx">from</span> <span class="s1">&#39;https://webr.r-wasm.org/v0.3.1/webr.mjs&#39;</span><span class="p">;</span> </code></pre></div><p>A summary of changes is described below, with the full <a href="https://github.com/r-wasm/webr/releases" target="_blank" rel="noopener">release notes</a> on GitHub.</p> <h2 id="evaluating-r-code-from-javascript">Evaluating R code from JavaScript <a href="#evaluating-r-code-from-javascript"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The underlying interpreter powering webR is built from the same source code as R itself, with patches applied so that it can run in the WebAssembly environment. With this release, we have rebased our patches on the latest stable version of R<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. By keeping our source in sync, improvements and bug fixes made by the R Core Team also benefit any project making use of webR.</p> <p>WebR&rsquo;s core functionality is to evaluate R code from a JavaScript environment. As such, it is imperative that this works well, even with large and complex scripts. The <a href="https://webr.r-wasm.org/v0.3.1/" target="_blank" rel="noopener">webR app</a> has been updated to better handle large R scripts, and scripts longer than 4096 characters should no longer cause strange issues in the R console.</p> <h3 id="loading-webassembly-packages">Loading WebAssembly packages <a href="#loading-webassembly-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The package management functions provided by webR have been expanded and improved. We set up webR with shims (interceptors) for <a href="https://rdrr.io/r/utils/install.packages.html" target="_blank" rel="noopener"><code>install.packages()</code></a>, <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library()</code></a>, and <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>require()</code></a> so that installing or loading R packages automatically downloads WebAssembly binaries from the <a href="https://repo.r-wasm.org" target="_blank" rel="noopener">webR package repository</a>. Also, it is no longer required to run the <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library()</code></a> command a second time to subsequently load the package.</p> <p>In this interactive example, webR is configured to automatically install WebAssembly packages. Click &ldquo;Run code&rdquo; to download the packages listed in the R script.</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-1">Loading webR...</button> <div id="webr-editor-1"></div> <div id="webr-code-output-1"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-1'); const outputDiv = document.getElementById('webr-code-output-1'); const editorDiv = document.getElementById('webr-editor-1'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `# Explicitly install wasm packages\ninstall.packages("cli")\n\n# Automatically install wasm packages\nlibrary(vctrs)\nrequire(jsonlite)\n\n# Confirm the packages installed successfully\nrownames(installed.packages())`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>See the <a href="https://docs.r-wasm.org/webr/latest/packages.html" target="_blank" rel="noopener">documentation</a> for more details on how to control this behaviour in your own webR-powered applications, including optionally showing an interactive download menu to the user.</p> <h3 id="error-handling-and-reporting">Error handling and reporting <a href="#error-handling-and-reporting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Improvements have been made to how webR raises R conditions as JavaScript exceptions. Exceptions now include the offending source R call in the error message text, better matching what is shown in a traditional R console.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s2">&#34;sin(&#39;abc&#39;)&#34;</span><span class="p">);</span> </code></pre></div><div class="output error"> <pre><code>Uncaught Error: Error in sin(&quot;abc&quot;): non-numeric argument to mathematical function </code></pre> </div> <p>Conditions raised when invoking function objects are now also re-thrown as JavaScript exceptions, rather than a generic <code>UnwindProtectException</code> error. Compare the error messages shown below from the previous and latest versions of webR to see the useful context added by this change.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="c1">// webR 0.2.2 </span><span class="c1"></span><span class="kr">const</span> <span class="nx">do_calc</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">`function (n) { rnorm(n) }`</span><span class="p">)</span> <span class="nx">do_calc</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">)</span> </code></pre></div><div class="output error"> <pre><code>Uncaught (in promise) UnwindProtectException: A non-local transfer of control occured during evaluation </code></pre> </div> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="c1">// webR 0.3.1 </span><span class="c1"></span><span class="kr">const</span> <span class="nx">do_calc</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">`function (n) { rnorm(n) }`</span><span class="p">)</span> <span class="nx">do_calc</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">)</span> </code></pre></div><div class="output error"> <pre><code>Uncaught (in promise) Error: Error in rnorm(n): invalid arguments </code></pre> </div> <p>Some base R features can be problematic when running R under WebAssembly. For example, in the constrained WebAssembly sandbox the base R function <a href="https://rdrr.io/r/base/system.html" target="_blank" rel="noopener"><code>system()</code></a> does not work. The latest release of webR now handles these cases more consistently, raising R <a href="https://rdrr.io/r/base/stop.html" target="_blank" rel="noopener"><code>stop()</code></a> conditions rather than incorrectly returning an empty result.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># webR 0.3.1</span> <span class="nf">system</span><span class="p">()</span> </code></pre></div><div class="output error"> <pre><code>Error in webr_hook_system(command) : The &quot;system()&quot; function is unsupported under Emscripten. </code></pre> </div> <h2 id="capturing-html-canvas-graphics-output">Capturing HTML canvas graphics output <a href="#capturing-html-canvas-graphics-output"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The <a href="https://docs.r-wasm.org/webr/latest/evaluating.html#evaluating-r-code-and-capturing-output-with-capturer" target="_blank" rel="noopener"><code>captureR()</code></a> function is designed to capture output generated when evaluating R code. In addition to capturing standard text output, details about errors and other R conditions are also captured. With this release, plots drawn using webR&rsquo;s HTML canvas graphics device, <a href="https://rdrr.io/pkg/webr/man/canvas.html" target="_blank" rel="noopener"><code>webr::canvas()</code></a>, are also captured and returned by default.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="c1">// Evaluate R code, capturing all output </span><span class="c1"></span><span class="kr">const</span> <span class="nx">capture</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">globalShelter</span><span class="p">.</span><span class="nx">captureR</span><span class="p">(</span><span class="sb">` </span><span class="sb"> x &lt;- rnorm(10000) </span><span class="sb"> print(x[1]) </span><span class="sb"> hist(x) </span><span class="sb">`</span><span class="p">);</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">capture</span><span class="p">);</span> </code></pre></div><div class="output"> <pre><code>{ result: Proxy(Object), output: [ { type: 'stdout', data: '[1] 0.7612882' }, ], images: [ ImageBitmap ], } </code></pre> </div> <p>Captured plots are returned as an array of <a href="https://developer.mozilla.org/en-US/docs/Web/API/ImageBitmap" target="_blank" rel="noopener"><code>ImageBitmap</code></a> JavaScript objects in the <code>images</code> property. This interface represents a bitmap image in a way that can be efficiently drawn to a HTML <a href="https://developer.mozilla.org/en-US/docs/Web/HTML/Element/canvas" target="_blank" rel="noopener"><code>&lt;canvas&gt;</code></a> element.</p> <p>This change makes plotting consistent with other forms of R output and simplifies the process when working with multiple independent R code blocks and output images. See the webR documentation on <a href="https://docs.r-wasm.org/webr/latest/evaluating.html#evaluating-r-code-and-capturing-output-with-capturer" target="_blank" rel="noopener">evaluating R code</a> for further details, and this <a href="https://observablehq.com/d/ec99bb89a4c646ab" target="_blank" rel="noopener">Observable notebook</a> for an example of capturing R plots from JavaScript.</p> <h3 id="graphics-device-bug-fixes">Graphics device bug fixes <a href="#graphics-device-bug-fixes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>In addition to adding the ability to capture graphics output, the <a href="https://rdrr.io/pkg/webr/man/canvas.html" target="_blank" rel="noopener"><code>webr::canvas()</code></a> graphics device has also had various bug fixes made to better implement R base graphics. The easiest way to demonstrate is probably by example:</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-2">Loading webR...</button> <div id="webr-editor-2"></div> <div id="webr-code-output-2"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-2'); const outputDiv = document.getElementById('webr-code-output-2'); const editorDiv = document.getElementById('webr-editor-2'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `# The lty and lwd graphical properties now work correctly\nplot(1:10, type = "l", lty = 2, lwd = 3)\npoints(1:10, cex = 3, lwd = 2)`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-3">Loading webR...</button> <div id="webr-editor-3"></div> <div id="webr-code-output-3"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-3'); const outputDiv = document.getElementById('webr-code-output-3'); const editorDiv = document.getElementById('webr-editor-3'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `# The cex graphical property is now taken into account\n# when calculating font sizes\nplot(1, main = "This is a large title", cex.main = 3)`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-4">Loading webR...</button> <div id="webr-editor-4"></div> <div id="webr-code-output-4"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-4'); const outputDiv = document.getElementById('webr-code-output-4'); const editorDiv = document.getElementById('webr-editor-4'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `# Rasters with negative width or height are now correctly\n# drawn mirrored and flipped.\ninstall.packages("jpeg")\nlogo = jpeg::readJPEG(system.file(package = "jpeg", "img", "Rlogo.jpg"))\nplot(NULL, xlab = "", ylab = "", xlim = c(0, 1), ylim = c(0, 1))\n\nrasterImage(logo, xleft = 0.2, xright = 0.5, ybottom = 0.5, ytop = 1)\nrasterImage(logo, xleft = 0.8, xright = 0.5, ybottom = 0.5, ytop = 1)\nrasterImage(logo, xleft = 0.2, xright = 0.5, ybottom = 0.5, ytop = 0)\nrasterImage(logo, xleft = 0.8, xright = 0.5, ybottom = 0.5, ytop = 0)`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <h2 id="the-r-object-interface">The R object interface <a href="#the-r-object-interface"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The R object interface provided by webR has been expanded to support the conversion of more types of JavaScript objects into R objects. Such conversions are automatically applied when interacting with the R environment from JavaScript.</p> <h3 id="raw-vectors">Raw vectors <a href="#raw-vectors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>JavaScript objects of type <code>TypedArray</code>, <code>ArrayBuffer</code>, and <code>ArrayBufferView</code> (e.g.  <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array" target="_blank" rel="noopener"><code>Uint8Array</code></a>) may now be used to construct R objects. By default, objects of this type are converted to R raw atomic vectors. This simplifies the transfer of binary data to R.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">Uint8Array</span><span class="p">([</span><span class="mi">4</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">24</span><span class="p">,</span> <span class="mi">15</span><span class="p">,</span> <span class="mi">12</span><span class="p">]);</span> <span class="c1">// Print data&#39;s R object class and an example byte </span><span class="c1"></span><span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">` </span><span class="sb"> class(data) </span><span class="sb"> data[2] </span><span class="sb">`</span><span class="p">,</span> <span class="p">{</span> <span class="nx">withAutoprint</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nx">env</span><span class="o">:</span> <span class="p">{</span> <span class="nx">data</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div><div class="output"> <pre><code>[1] &quot;raw&quot; [1] 0c </code></pre> </div> <h3 id="r-dataframe">R <code>data.frame</code> <a href="#r-dataframe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>JavaScript objects of shape <code>{ x: [...], y: [...] }</code>, with data in a &ldquo;long&rdquo; column-based form, can now be used to construct R objects. In previous versions of webR, this object shape was reserved for future use. However, with this release webR now constructs an R <code>data.frame</code> by taking the source object&rsquo;s properties as column vectors. The resulting <code>data.frame</code> can then be manipulated from R in the usual way:</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="p">{</span> <span class="nx">column_x</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;foo&#34;</span><span class="p">,</span> <span class="s2">&#34;bar&#34;</span><span class="p">,</span> <span class="s2">&#34;baz&#34;</span><span class="p">],</span> <span class="nx">column_y</span><span class="o">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">7</span><span class="p">]</span> <span class="p">}</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">` </span><span class="sb"> class(data) </span><span class="sb"> colnames(data) </span><span class="sb"> data[2:3,] </span><span class="sb">`</span><span class="p">,</span> <span class="p">{</span> <span class="nx">withAutoprint</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nx">env</span><span class="o">:</span> <span class="p">{</span> <span class="nx">data</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div><div class="output"> <pre><code>[1] &quot;data.frame&quot; [1] &quot;column_x&quot; &quot;column_y&quot; column_x column_y 2 bar 3 3 baz 7 </code></pre> </div> <p>Similarly, an R <code>data.frame</code> can be converted back into a JavaScript object of this form:</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">cars</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">`mtcars`</span><span class="p">);</span> <span class="nx">await</span> <span class="nx">cars</span><span class="p">.</span><span class="nx">toObject</span><span class="p">();</span> </code></pre></div><div class="output"> <pre><code>{ am: [1, 1, 1, ..., 1], carb: [4, 4, 1, ..., 2], cyl: [6, 6, 4, ..., 4] ..., wt: [2.62, 2.875, 2.32, ..., 2.78], } </code></pre> </div> <h3 id="d3-wide-format">D3 &ldquo;wide&rdquo; format <a href="#d3-wide-format"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>In JavaScript, particularly when using frameworks built upon <a href="https://d3js.org" target="_blank" rel="noopener">D3</a>, it is typical to work with data in a &ldquo;wide&rdquo; form: an array of objects per row, each including all the column names and values. With this release, webR can also convert JavaScript objects in this form into an R <code>data.frame</code>.</p> <p>The following example loads the same data as shown in the previous example but expressed in the &ldquo;wide&rdquo; form.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="p">[</span> <span class="p">{</span> <span class="nx">column_x</span><span class="o">:</span> <span class="s2">&#34;foo&#34;</span><span class="p">,</span> <span class="nx">column_y</span><span class="o">:</span> <span class="mi">1</span> <span class="p">},</span> <span class="p">{</span> <span class="nx">column_x</span><span class="o">:</span> <span class="s2">&#34;bar&#34;</span><span class="p">,</span> <span class="nx">column_y</span><span class="o">:</span> <span class="mi">3</span> <span class="p">},</span> <span class="p">{</span> <span class="nx">column_x</span><span class="o">:</span> <span class="s2">&#34;baz&#34;</span><span class="p">,</span> <span class="nx">column_y</span><span class="o">:</span> <span class="mi">7</span> <span class="p">},</span> <span class="p">];</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">` </span><span class="sb"> class(data) </span><span class="sb"> colnames(data) </span><span class="sb"> data[2:3,] </span><span class="sb">`</span><span class="p">,</span> <span class="p">{</span> <span class="nx">withAutoprint</span><span class="o">:</span> <span class="kc">true</span><span class="p">,</span> <span class="nx">env</span><span class="o">:</span> <span class="p">{</span> <span class="nx">data</span> <span class="p">}</span> <span class="p">});</span> </code></pre></div><div class="output"> <pre><code>[1] &quot;data.frame&quot; [1] &quot;column_x&quot; &quot;column_y&quot; column_x column_y 2 bar 3 3 baz 7 </code></pre> </div> <p>An R <code>data.frame</code> can also be converted into a D3 compatible JavaScript object:</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">cars</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="sb">`mtcars`</span><span class="p">);</span> <span class="nx">await</span> <span class="nx">cars</span><span class="p">.</span><span class="nx">toD3</span><span class="p">();</span> </code></pre></div><div class="output"> <pre><code>[ { mpg: 21, cyl: 6, disp: 160, ... }, { mpg: 21, cyl: 6, disp: 160, ... }, { mpg: 22.8, cyl: 4, disp: 108, ...}, ... { mpg: 21.4, cyl: 4, disp: 121, ...}, ] </code></pre> </div> <h2 id="webassembly-toolchain-upgrades">WebAssembly toolchain upgrades <a href="#webassembly-toolchain-upgrades"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We have updated our WebAssembly build system, upgrading the <a href="https://emscripten.org" target="_blank" rel="noopener">Emscripten</a> C/C++ compiler to version 3.1.47 and the <a href="https://flang.llvm.org/docs/" target="_blank" rel="noopener">LLVM Flang</a> Fortran compiler to be based on LLVM 18.1.1. As part of the work, webR now supports building under Nix using <a href="https://nixos.wiki/wiki/Flakes" target="_blank" rel="noopener">flakes</a>, suggested and largely implemented by <a href="https://github.com/wch" target="_blank" rel="noopener">@wch</a>.</p> <p>With this, source-code level reproducible builds of the webR WebAssembly binaries can be made, strengthening the argument for webR as a potential future platform for reproducible data science.</p> <h3 id="llvm-flang">LLVM Flang <a href="#llvm-flang"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>To compile Fortran sources in the R source code<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> for webR, we require a Fortran compiler that supports outputting WebAssembly objects. This is a surprisingly tricky business, and our current solution is to maintain a patched version of LLVM&rsquo;s <code>flang-new</code> compiler frontend.</p> <p>In recent months, the patches we must make to LLVM Flang have become smaller and easier to manage as the LLVM team continues to improve the Flang frontend. While too long for this post, for those interested in exactly what changes we make to enable WebAssembly output, I have written a deep-dive blog post, <a href="https://gws.phd/posts/fortran_wasm/" target="_blank" rel="noopener">Fortran on WebAssembly</a>.</p> <h2 id="additional-system-libraries-and-rust-support">Additional system libraries and Rust support <a href="#additional-system-libraries-and-rust-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to some great work by <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a> and <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, this release of webR includes some additional WebAssembly system libraries and software in the webR Docker container. This includes numerical libraries such as <a href="https://www.gnu.org/software/gsl/" target="_blank" rel="noopener">GSL</a> and <a href="https://gmplib.org" target="_blank" rel="noopener">GMP</a>, image manipulation tools such as <a href="https://imagemagick.org/" target="_blank" rel="noopener">ImageMagick</a>, and a Rust compiler configured to build WebAssembly R packages containing Rust source code.</p> <p>A demonstration R package containing Rust code, compatible with webR, can be found at <a href="https://github.com/yutannihilation/savvy-webr-test/">https://github.com/yutannihilation/savvy-webr-test/</a>.</p> <p>An example Shiny app making use of the WebAssembly compiled ImageMagick library is shown below, with the source code at <a href="https://github.com/jeroen/shinymagick">https://github.com/jeroen/shinymagick</a>.</p> <iframe style="border: 1px solid black;" width="100%" height="550px" src="https://georgestagg.github.io/shinymagick/"> </iframe> <h2 id="webassembly-r-package-binaries">WebAssembly R package binaries <a href="#webassembly-r-package-binaries"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>With the introduction of additional system libraries and changes to the WebAssembly toolchain, the default webR package repository has also been refreshed. The repository tends to follow CRAN package releases, though is updated less frequently. <strong>19452</strong> WebAssembly R packages have been recompiled from source for this release, with <strong>12969</strong> packages, about 63% of CRAN, fully available<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup> for use in webR.</p> <p>As my usual caveat goes, we have not been able to test all the available packages. Feel free to try your favourite package in the <a href="https://webr.r-wasm.org/v0.3.1/" target="_blank" rel="noopener">webR app</a> and let us know in a <a href="https://github.com/r-wasm/webr/issues" target="_blank" rel="noopener">GitHub issue</a> if there is a problem.</p> <p>The <a href="https://repo.r-wasm.org" target="_blank" rel="noopener">package repository index</a> contains further information and a searchable list of WebAssembly R packages. In addition, <a href="https://r-universe.dev" target="_blank" rel="noopener">R-Universe</a> also builds webR-compatible binaries and so can be used as an alternative repository for access to even more R packages.</p> <h3 id="building-custom-webassembly-r-packages">Building custom WebAssembly R packages <a href="#building-custom-webassembly-r-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you&rsquo;d like to build your own R packages for webR, the <a href="https://r-wasm.github.io/rwasm/" target="_blank" rel="noopener">rwasm</a> package provides functions to help compile R packages for WebAssembly, manage repositories, and prepare webR-compatible filesystem images.</p> <p>We&rsquo;ve also started building <a href="https://github.com/r-wasm/actions/" target="_blank" rel="noopener">reusable workflows for GitHub Actions</a>. If you have an R package with source code hosted on GitHub, an action can be added to your repository such that a WebAssembly version of your package will be built automatically by a GitHub runner on package release.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you, as always, to the users and developers contributing to webR in the form of discussion in issues, bug reports, and pull requests.</p> <p> <a href="https://github.com/adrianolszewski" target="_blank" rel="noopener">@adrianolszewski</a>, <a href="https://github.com/christianp" target="_blank" rel="noopener">@christianp</a>, <a href="https://github.com/coatless" target="_blank" rel="noopener">@coatless</a>, <a href="https://github.com/ColinFay" target="_blank" rel="noopener">@ColinFay</a>, <a href="https://github.com/drgomulka" target="_blank" rel="noopener">@drgomulka</a>, <a href="https://github.com/erex" target="_blank" rel="noopener">@erex</a>, <a href="https://github.com/gitdemont" target="_blank" rel="noopener">@gitdemont</a>, <a href="https://github.com/gorkang" target="_blank" rel="noopener">@gorkang</a>, <a href="https://github.com/isbool" target="_blank" rel="noopener">@isbool</a>, <a href="https://github.com/JeremyPasco" target="_blank" rel="noopener">@JeremyPasco</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/Luke-Symes-Tsy" target="_blank" rel="noopener">@Luke-Symes-Tsy</a>, <a href="https://github.com/maek-ies" target="_blank" rel="noopener">@maek-ies</a>, <a href="https://github.com/MaybeJustJames" target="_blank" rel="noopener">@MaybeJustJames</a>, <a href="https://github.com/ravinder387" target="_blank" rel="noopener">@ravinder387</a>, <a href="https://github.com/StaffanBetner" target="_blank" rel="noopener">@StaffanBetner</a>, <a href="https://github.com/SugarRayLua" target="_blank" rel="noopener">@SugarRayLua</a>, <a href="https://github.com/takahser" target="_blank" rel="noopener">@takahser</a>, <a href="https://github.com/tim-newans" target="_blank" rel="noopener">@tim-newans</a>, <a href="https://github.com/timelyportfolio" target="_blank" rel="noopener">@timelyportfolio</a>, <a href="https://github.com/tstubbs-evolution" target="_blank" rel="noopener">@tstubbs-evolution</a>, <a href="https://github.com/yhm-amber" target="_blank" rel="noopener">@yhm-amber</a>, <a href="https://github.com/yii-iiy" target="_blank" rel="noopener">@yii-iiy</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zhangwenda0518" target="_blank" rel="noopener">@zhangwenda0518</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>The latest stable release at the time of writing: <a href="https://cran.rstudio.com/doc/manuals/r-release/NEWS.html" target="_blank" rel="noopener">R 4.3.3 &mdash; &ldquo;Angel Food Cake&rdquo;</a> <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>There are also many R packages containing Fortran source code. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>Here &ldquo;available&rdquo; means that both a binary build of an R package and all of its dependencies can be downloaded from the repository. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Fair machine learning with tidymodels https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/ Thu, 21 Mar 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/03/tidymodels-fairness/ <p>We&rsquo;re very, very excited to announce the introduction of tools for assessing model fairness in tidymodels. This effort involved coordination from various groups at Posit over the course of over a year and resulted in a toolkit that we believe is both principled and impactful.</p> <p>Fairness assessment features for tidymodels extend across a number of packages; to install each, use the tidymodels meta-package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidymodels"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="machine-learning-fairness">Machine learning fairness <a href="#machine-learning-fairness"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In recent years, high-profile analyses have called attention to many contexts where the use of machine learning deepened inequities in our communities. In late 2022, a group of Posit employees across teams, roles, and technical backgrounds formed a reading group to engage with literature on machine learning fairness, a research field that aims to define what it means for a statistical model to act unfairly and take measures to address that unfairness. We then designed new software functionality and learning resources to help data scientists measure and critique the ways in which the machine learning models they&rsquo;ve built might disparately impact people affected by that model.</p> <p>Perhaps the core question that fairness as a research field has tried to address is exactly what a machine learning model acting fairly entails. As a recent primer notes, &ldquo;[t]he rapid growth of this new field has led to wildly inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions&rdquo; (Mitchell et al. 2021).</p> <p>Broadly, approaches to fairness provide tooling&mdash;whether social or algorithmic&mdash;to understand the social implications of utilizing a machine learning model. Different researchers categorize approaches to fairness differently, but work in this area can be loosely summarized as falling into one or more of the following categories: assessment, mitigation, and critique.</p> <ul> <li> <p><em>Assessment</em>: Fairness assessment tooling allows practitioners to measure the degree to which a machine learning model acts unfairly given some definition of fairness. The chosen definition of fairness greatly impacts whether a model&rsquo;s predictions are regarded as fair. While there have been many, many definitions of fairness proposed&mdash;a popular tutorial on these approaches compares 21 canonical definitions&mdash;most all of them involve simple inequalities based on a small set of conditional probabilities (Narayanan 2018; Mitchell et al. 2021).</p> </li> <li> <p><em>Mitigation</em>: Given a fairness assessment, mitigation approaches reduce the degree to which a machine learning model acts unfairly given some definition of fairness. Making a model more fair according to one metric may make that model less fair according to another. Approaches to mitigation are subject to impossibility theorems, which show that &ldquo;definitions are not mathematically or morally compatible in general&rdquo; (Mitchell et al. 2021). That is, there is no way to satisfy many fairness constraints at once unless we live in a world with no inequality to start with. However, more recent studies have shown that near-fairness with respect to several definitions is quite possible (Bell et al. 2023).</p> </li> <li> <p><em>Critique</em>: While approaches to assessment and mitigation seek to reduce complexity and situate notions of fairness in mathematical formalism, sociotechnical critique provides tooling to better understand how mathematical notions of fairness may fail to account for the real-world complexity of social phenomena. Work in this discipline often reveals that, in the process of measuring or addressing unfairness by some definition, methods for fairness assessment and mitigation may actually ignore, necessitate, or introduce unfairness by some other definition.</p> </li> </ul> <p>The work of scoping Posit&rsquo;s resources for fair machine learning, in large part, involved striking the right balance between tools in these categories and integrating them thoughtfully among our existing functionality. Rather than supporting as many fairness-oriented tools as possible, our goal is to best enable users of our tools to reason well about the fairness-relevant decisions they make throughout the modeling process.</p> <h2 id="additions-to-tidymodels">Additions to tidymodels <a href="#additions-to-tidymodels"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The most recent set of tidymodels releases include changes that provide support for assessment and critique using the tidymodels framework.</p> <!-- TODO: change the tidymodels.org urls to the merged versions --> <p>The most recent yardstick release introduces <a href="https://yardstick.tidymodels.org/reference/new_groupwise_metric.html" target="_blank" rel="noopener">a tool to create fairness metrics</a> with the problem context in mind, as well as <a href="https://yardstick.tidymodels.org/reference/index.html#fairness-metrics" target="_blank" rel="noopener">some outputs of that tool</a> implementing common fairness metrics. For a higher-level introduction to the concept of a groupwise metric, we&rsquo;ve also introduced a <a href="https://yardstick.tidymodels.org/articles/grouping.html" target="_blank" rel="noopener">new package vignette</a>. To see those fairness metrics in action, see <a href="https://www.tidymodels.org/learn/work/fairness-detectors/" target="_blank" rel="noopener">this new article on tidymodels.org</a>, a case study using data about GPT detectors.</p> <p>The most recent tune release integrates support for those fairness metrics from yardstick, allowing users to evaluate fairness criteria across resamples. To demonstrate those features in context, we&rsquo;ve added <a href="https://www.tidymodels.org/learn/work/fairness-readmission/" target="_blank" rel="noopener">another new article on tidymodels.org</a>, modeling hospital readmission for patients with Type I diabetes.</p> <p>Notably, we haven&rsquo;t introduced functionality to support mitigation. While a number of methods have proliferated over the years to finetune models to act more fairly with respect to some fairness criteria, each apply only in relatively niche applications with modest experimental results (Agarwal et al. 2018; Mittelstadt, Wachter, and Russell 2023). For now, we believe that, in practice, the efforts of practitioners&mdash;and thus our efforts to support them&mdash;are better spent engaging with the sociotechnical context of a given modeling problem (Holstein et al. 2019).</p> <p>We&rsquo;re excited to support modeling practitioners in fairness-oriented analysis of models and look forward to seeing how these methods are put to work.</p> <h2 id="references">References <a href="#references"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><div id="refs" class="references csl-bib-body hanging-indent" entry-spacing="0"> <div id="ref-agarwal2018" class="csl-entry"> <p>Agarwal, Alekh, Alina Beygelzimer, Miroslav Dudı́k, John Langford, and Hanna Wallach. 2018. &ldquo;A Reductions Approach to Fair Classification.&rdquo; In <em>International Conference on Machine Learning</em>, 60&ndash;69. PMLR.</p> </div> <div id="ref-bell2023" class="csl-entry"> <p>Bell, Andrew, Lucius Bynum, Nazarii Drushchak, Tetiana Zakharchenko, Lucas Rosenblatt, and Julia Stoyanovich. 2023. &ldquo;The Possibility of Fairness: Revisiting the Impossibility Theorem in Practice.&rdquo; In <em>Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency</em>, 400&ndash;422. FAccT &lsquo;23. New York, NY, USA: Association for Computing Machinery. <a href="https://doi.org/10.1145/3593013.3594007">https://doi.org/10.1145/3593013.3594007</a>.</p> </div> <div id="ref-holstein2019" class="csl-entry"> <p>Holstein, Kenneth, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. &ldquo;Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?&rdquo; In <em>Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems</em>, 1&ndash;16.</p> </div> <div id="ref-mitchell2021" class="csl-entry"> <p>Mitchell, Shira, Eric Potash, Solon Barocas, Alexander D&rsquo;Amour, and Kristian Lum. 2021. &ldquo;Algorithmic Fairness: Choices, Assumptions, and Definitions.&rdquo; <em>Annual Review of Statistics and Its Application</em> 8 (1): 141&ndash;63. <a href="https://doi.org/10.1146/annurev-statistics-042720-125902">https://doi.org/10.1146/annurev-statistics-042720-125902</a>.</p> </div> <div id="ref-mittelstadt2023" class="csl-entry"> <p>Mittelstadt, Brent, Sandra Wachter, and Chris Russell. 2023. &ldquo;The Unfairness of Fair Machine Learning: Levelling down and Strict Egalitarianism by Default.&rdquo; <em>arXiv Preprint arXiv:2302.02404</em>.</p> </div> <div id="ref-narayanan2018" class="csl-entry"> <p>Narayanan, Arvind. 2018. &ldquo;Translation Tutorial: 21 Fairness Definitions and Their Politics.&rdquo; In <em>Proc. Conf. Fairness Accountability Transp., New York, Usa</em>, 1170:3.</p> </div> </div> ggplot2 3.5.0: Introducing: coord_radial() https://www.tidyverse.org/blog/2024/03/ggplot2-3-5-0-coord-radial/ Fri, 01 Mar 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/03/ggplot2-3-5-0-coord-radial/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We are happy to announce the release of <a href="https://ggplot2.tidyverse.org" target="_blank" rel="noopener">ggplot2</a> 3.5.0. This is one blogpost among several outlining a new polar coordinate system. Please find the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/">main release post</a> to read about other exciting changes.</p> <p>Polar coordinates are a good reminder of the flexibility of the Grammar of Graphics: pie charts are just bar charts with polar coordinates. While the tried and tested <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a> has served well in the past to fulfill your pie chart needs, we felt it was due some modernisation. We realised we could not adapt <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a> to fit with the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/#guide-rewrite">new guide system</a> without severely breaking existing plots, so <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> was born to give a facelift to the polar coordinate system in ggplot2.</p> <p>Relative to <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a>, <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> can:</p> <ol> <li>Draw circle sectors instead of only full circles.</li> <li>Avoid data vanishing in the centre of the plot.</li> <li>Adjust text angles on the fly.</li> <li>Use the new guide system.</li> </ol> <h2 id="an-updated-look">An updated look <a href="#an-updated-look"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first noticeable contrast with <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a> is that <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> is not particularly suited to building pie charts. Instead, it uses the scale expansion conventions like <a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html" target="_blank" rel="noopener"><code>coord_cartesian()</code></a>. This makes sense for most chart types, but not pie charts. Nonetheless, you can use the <code>expand = FALSE</code> setting to use <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> for pie charts.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://patchwork.data-imaginist.com'>patchwork</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://scales.r-lib.org'>scales</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>pie</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_discrete.html'>scale_y_discrete</a></span><span class='o'>(</span>guide <span class='o'>=</span> <span class='s'>"none"</span>, name <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='s'>"none"</span><span class='o'>)</span></span> <span><span class='nv'>default</span> <span class='o'>&lt;-</span> <span class='nv'>pie</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"default"</span><span class='o'>)</span></span> <span><span class='nv'>no_expand</span> <span class='o'>&lt;-</span> <span class='nv'>pie</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>expand <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"expand = FALSE"</span><span class='o'>)</span></span> <span><span class='nv'>polar</span> <span class='o'>&lt;-</span> <span class='nv'>pie</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_polar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"coord_polar()"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>default</span> <span class='o'>|</span> <span class='nv'>no_expand</span> <span class='o'>|</span> <span class='nv'>polar</span></span> </code></pre> <p><img src="figs/compare_polar-1.png" alt="Three pie charts showing the proportion of each cylinder number. The first has a gap in the middle and at the top with a grey circle in the background and is titled 'default'. The second is titled 'expand = FALSE' and shows a full pie chart with tick marks labelling the angle positions. The last plot is a full pie chart with a gray rectangular background without tick marks and a white line around the pie." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Some visual differences stand out in the plots above. In <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a>, the panel background covers the data area of the plot, not a rectangle. It also does not have a grid-line encircling the plot and instead uses tick marks to indicate values along the theta (angle) coordinate. You may also notice that <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a> still draws the radius axis, despite instructions to use <code>guide = &quot;none&quot;</code>. That is the integration with the guide system that birthed <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a>.</p> <h2 id="partial-polar-plots">Partial polar plots <a href="#partial-polar-plots"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another important difference is that <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> does not necessarily need to display a full circle. By setting the <code>start</code> and <code>end</code> arguments separately, you can now make a partial polar plot. This makes it much easier to make semi- or quarter-circle plots.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>half</span> <span class='o'>&lt;-</span> <span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>start <span class='o'>=</span> <span class='o'>-</span><span class='m'>0.5</span> <span class='o'>*</span> <span class='nv'>pi</span>, end <span class='o'>=</span> <span class='m'>0.5</span> <span class='o'>*</span> <span class='nv'>pi</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"−0.5π to +0.5π"</span><span class='o'>)</span></span> <span><span class='nv'>quarter</span> <span class='o'>&lt;-</span> <span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>start <span class='o'>=</span> <span class='m'>0</span>, end <span class='o'>=</span> <span class='m'>0.5</span> <span class='o'>*</span> <span class='nv'>pi</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"0 to +0.5π"</span><span class='o'>)</span></span> <span><span class='nv'>half</span> <span class='o'>|</span> <span class='nv'>quarter</span></span> </code></pre> <p><img src="figs/partial_polar-1.png" alt="Two polar scatterplots of the 'mpg' dataset. The left plot is shaped like as a semicircle and the right plot as a quarter circle." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="donuts">Donuts <a href="#donuts"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>It was already possible to turn a pie-chart into a donut-chart with <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_polar()</code></a>. This is made even easier in <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> by setting the <code>inner.radius</code> argument to make a donut hole. For most plots, this avoids crowding data points in the center of the plot: points with a widely different <code>theta</code> coordinate but similarly small <code>r</code> coordinate are placed further apart.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>inner.radius <span class='o'>=</span> <span class='m'>0.3</span>, r_axis_inside <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/open_radial_plot-1.png" alt="A donut-shaped scatterplot of the 'mpg' dataset." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="text-annotations">Text annotations <a href="#text-annotations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A common grievance with about polar coordinates is that it was cumbersome to rotate text annotations along with the <code>theta</code> coordinate. Calculating the correct angles for labels is pretty involved and usually changes from plot to plot depending on how many items need to be displayed. To remove some of this hassle <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> has a <code>rotate_angle</code> switch, that will line up the text&rsquo;s <code>angle</code> aesthetic with the theta coordinate. For text angles of 0 degrees, this will place text in a tangent orientation to the circle and for angles of 90 degrees, this places text along the radius, as in the plot below.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq_along</a></span><span class='o'>(</span><span class='nv'>mpg</span><span class='o'>)</span>, <span class='nv'>mpg</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_col</a></span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_text</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='m'>32</span>, label <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>rownames</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> angle <span class='o'>=</span> <span class='m'>90</span>, hjust <span class='o'>=</span> <span class='m'>1</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>rotate_angle <span class='o'>=</span> <span class='kc'>TRUE</span>, expand <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/text_angles-1.png" alt="A wind rose plot showing miles per gallon for different cars. The car names skirt the outer edge of the plot and are oriented towards the centre." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="axes">Axes <a href="#axes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Because the logic of drawing axes for polar coordinates is not the same as when axes are perfectly vertical or horizontal, we used the new guide system to build an axis specific to <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a>: the <a href="https://ggplot2.tidyverse.org/reference/guide_axis_theta.html" target="_blank" rel="noopener"><code>guide_axis_theta()</code></a> axis. Guides for <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> can be set using <code>theta</code> and <code>r</code> name in the <a href="https://ggplot2.tidyverse.org/reference/guides.html" target="_blank" rel="noopener"><code>guides()</code></a> function. While the <code>r</code> axis can be the regular <a href="https://ggplot2.tidyverse.org/reference/guide_axis.html" target="_blank" rel="noopener"><code>guide_axis()</code></a>, the <code>theta</code> axis uses the highly specialised <a href="https://ggplot2.tidyverse.org/reference/guide_axis_theta.html" target="_blank" rel="noopener"><code>guide_axis_theta()</code></a>. The theta axis shares many features with typical axes, like setting the text angle or the new <code>minor.ticks</code> and <code>cap</code> settings. More on these settings in the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-axes/">axis blog</a>. As seen in previous plots, the default is to place text horizontally. One neat trick we&rsquo;ve put into <a href="https://ggplot2.tidyverse.org/reference/coord_polar.html" target="_blank" rel="noopener"><code>coord_radial()</code></a> is that we can set a <em>relative</em> text angle in the guides, such as in the plot below.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>class</span>, <span class='nv'>displ</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>start <span class='o'>=</span> <span class='m'>0.25</span> <span class='o'>*</span> <span class='nv'>pi</span>, end <span class='o'>=</span> <span class='m'>1.75</span> <span class='o'>*</span> <span class='nv'>pi</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> theta <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis_theta.html'>guide_axis_theta</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span>,</span> <span> r <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/axis_angles-1.png" alt="Boxplot of the 'mpg' dataset displayed in partial polar coordinates. The theta labels are placed tangential to the circle. The radius labels line up with the tick mark direction." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The theme elements to style these axes have the <code>theta</code> or <code>r</code> position indication, so to change the the axis line, you use the <code>axis.line.theta</code> and <code>axis.line.r</code> arguments. The theme settings can also be used to set the <em>absolute</em> angle of text.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>class</span>, <span class='nv'>displ</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span>start <span class='o'>=</span> <span class='m'>0.25</span> <span class='o'>*</span> <span class='nv'>pi</span>, end <span class='o'>=</span> <span class='m'>1.75</span> <span class='o'>*</span> <span class='nv'>pi</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> axis.line.theta <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span>,</span> <span> axis.text.theta <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>90</span><span class='o'>)</span>,</span> <span> axis.text.r <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"blue"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/axis_styling-1.png" alt="Boxplot of the 'mpg' dataset displayed in partial polar coordinates. The theta labels are placed vertically and a red line traces the outer circle. The radius labels are displayed in blue." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Lastly, there can also be secondary axes. We anticipate that this is practically never needed, as grid lines follow the primary axes and without them, it is very hard to read from axes in polar coordinates. However, if there is some reason for using secondary axes on polar coordinates, you can use the <code>theta.sec</code> and <code>r.sec</code> names in the <a href="https://ggplot2.tidyverse.org/reference/guides.html" target="_blank" rel="noopener"><code>guides()</code></a> function to control the guides. Please note that a secondary theta axis is entirely useless when <code>inner.radius = 0</code> (the default). There are no separate theme options for secondary r/theta axes, but to style them separately from the primary axes, you can use the <code>theme</code> argument in the guide instead.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>pressure</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>temperature</span>, <span class='nv'>pressure</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"blue"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span></span> <span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>suffix <span class='o'>=</span> <span class='s'>"°C"</span><span class='o'>)</span>,</span> <span> sec.axis <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/sec_axis.html'>sec_axis</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.x</span> <span class='o'>*</span> <span class='m'>9</span><span class='o'>/</span><span class='m'>5</span> <span class='o'>+</span> <span class='m'>35</span>, labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>suffix <span class='o'>=</span> <span class='s'>"°F"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='o'>(</span></span> <span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>suffix <span class='o'>=</span> <span class='s'>" mmHg"</span><span class='o'>)</span>,</span> <span> sec.axis <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/sec_axis.html'>sec_axis</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.x</span> <span class='o'>*</span> <span class='m'>0.133322</span>, labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>suffix <span class='o'>=</span> <span class='s'>" kPa"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> theta.sec <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis_theta.html'>guide_axis_theta</a></span><span class='o'>(</span>theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.line.theta <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> r.sec <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.text.r <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_polar.html'>coord_radial</a></span><span class='o'>(</span></span> <span> start <span class='o'>=</span> <span class='m'>0.25</span> <span class='o'>*</span> <span class='nv'>pi</span>, end <span class='o'>=</span> <span class='m'>1.75</span> <span class='o'>*</span> <span class='nv'>pi</span>,</span> <span> inner.radius <span class='o'>=</span> <span class='m'>0.3</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/secondary_axes-1.png" alt="A lineplot of the 'pressure' dataset in partial polar coordinates that is shaped like a donut with a bite taken out on top. The primary, outer theta axis displays temperature in degrees Celcius. The secondary, inner theta axis displays temperature in degrees Fahrenheit and has an axis line. The primary radius axis on the right displays pressure in millimetres of mercury. The secondary radius axis on the left displays pressure in kilo-Pascals in red text." width="700px" style="display: block; margin: auto;" /></p> </div> ggplot2 3.5.0: Axes https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-axes/ Wed, 28 Feb 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-axes/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We are pleased to release <a href="https://ggplot2.tidyverse.org" target="_blank" rel="noopener">ggplot2</a> 3.5.0. This release is a large one, so we have split the updates into multiple posts. This posts outlines changes to axes; see the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/">main release post</a> to learn about other changes.</p> <p>Axes, alongside <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-legends/">legends</a>, are visual representations of scales and allow observers to read values off of a plot. The innards of axes, like other guides, underwent a major overhaul with the guide system rewrite. Axes specifically are guides for positions and classically display labelled tick marks. In Cartesian coordinates, these are the x- and y-positions, but in non-Cartesian systems may reflect a theta, radius, longitude or latitude. In ggplot2, an axis is usually represented by the <a href="https://ggplot2.tidyverse.org/reference/guide_axis.html" target="_blank" rel="noopener"><code>guide_axis()</code></a> function. We outline the following changes to axes:</p> <ul> <li> <a href="#minor-ticks">Minor ticks</a></li> <li> <a href="#capping">Capping the axis line</a></li> <li> <a href="#logarithmic-axes">Logartihmic axes</a></li> <li> <a href="#stacked-axes">Stacking axes</a></li> <li> <a href="#display-in-facets">Display in facets</a></li> </ul> <div class="highlight"> </div> <h2 id="minor-ticks">Minor ticks <a href="#minor-ticks"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A much requested expansion of axis capabilities is the ability to draw minor ticks. To draw minor ticks, you can use the <code>minor.ticks</code> argument of <a href="https://ggplot2.tidyverse.org/reference/guide_axis.html" target="_blank" rel="noopener"><code>guide_axis()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>minor.ticks <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span>,</span> <span> y <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>minor.ticks <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='nv'>p</span></span> </code></pre> <p><img src="figs/minor_ticks-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. Both the x and y axes have smaller ticks in between normal ticks." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The minor ticks are unlabelled ticks and follow the <code>minor_breaks</code> provided to the scale. Their length is determined by the <code>axis.minor.ticks.length</code> and their positional children. The rest of their appearance is inherited from the major ticks, as can be seen in the plot below where the minor ticks on the y-axis are also blue. To tweak their style separately from the major ticks, the <code>axis.minor.ticks.{x.bottom/x.top/y.left/y.right}</code> setting can be used. Please note that there is <em>no</em> <code>axis.minor.ticks</code> setting without the position suffixes, as they inherit from the major ticks.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span>minor_breaks <span class='o'>=</span> <span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/breaks_width.html'>breaks_width</a></span><span class='o'>(</span><span class='m'>0.2</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> axis.ticks.length <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='s'>"pt"</span><span class='o'>)</span>,</span> <span> axis.minor.ticks.length <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>rel</a></span><span class='o'>(</span><span class='m'>0.5</span><span class='o'>)</span>,</span> <span> axis.minor.ticks.x.bottom <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>'red'</span><span class='o'>)</span>,</span> <span> axis.ticks.y <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"blue"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/minor_ticks_theming-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The y-axis has blue larger and smaller tick marks, whereas the x-axis has the larger ticks in black and the smaller ticks in red. The x-axis has 4 smaller ticks in between large ones and the smaller ticks are half the size of larger ticks." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="capping">Capping <a href="#capping"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Axes can now also be &lsquo;capped&rsquo; at the upper and lower end. We hesitate to call this improvement &lsquo;new&rsquo;, as it has been a part of base R plotting since time immemorial. When axes are capped, the axis line will not be drawn up to the panel edge, but up to the first and last breaks. Unsurprisingly, this only affects plots where the axis line is not blank.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>cap <span class='o'>=</span> <span class='s'>"both"</span><span class='o'>)</span>, <span class='c'># Cap both ends</span></span> <span> y <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis.html'>guide_axis</a></span><span class='o'>(</span>cap <span class='o'>=</span> <span class='s'>"upper"</span><span class='o'>)</span> <span class='c'># Cap the upper end</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.line <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/capped_axes-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The y-axis line starts at the bottom of the panel and continues to the top break. The x-axis line starts at the most left break and ends at the most right break." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="logarithmic-axes">Logarithmic axes <a href="#logarithmic-axes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A new axis for displaying logarithmic (and related) scales has been added: <a href="https://ggplot2.tidyverse.org/reference/guide_axis_logticks.html" target="_blank" rel="noopener"><code>guide_axis_logticks()</code></a>. This axis draws three types of tick marks at log10-spaced positions. The ticks positions are placed in the original, untransformed data-space, so the axis plays well with scale- and coord-transformations. To accommodate a series of logarithmic-like transformations, such as <a href="https://scales.r-lib.org/reference/transform_log.html" target="_blank" rel="noopener"><code>scales::transform_pseudo_log()</code></a> or <a href="https://scales.r-lib.org/reference/transform_asinh.html" target="_blank" rel="noopener"><code>scales::transform_asinh()</code></a>, scales that include 0 in their limits have the ticks mirrored around 0.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>r</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0.001</span>, <span class='m'>0.999</span>, length.out <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Cauchy.html'>qcauchy</a></span><span class='o'>(</span><span class='nv'>r</span><span class='o'>)</span>,</span> <span> y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Lognormal.html'>qlnorm</a></span><span class='o'>(</span><span class='nv'>r</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/coord_trans.html'>coord_trans</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"reverse"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='o'>(</span></span> <span> transform <span class='o'>=</span> <span class='s'>"log10"</span>,</span> <span> breaks <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.1</span>, <span class='m'>1</span>, <span class='m'>10</span><span class='o'>)</span>,</span> <span> guide <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_axis_logticks.html'>guide_axis_logticks</a></span><span class='o'>(</span>long <span class='o'>=</span> <span class='m'>2</span>, mid <span class='o'>=</span> <span class='m'>1</span>, short <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span></span> <span> transform <span class='o'>=</span> <span class='s'>"asinh"</span>,</span> <span> breaks <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>100</span>, <span class='o'>-</span><span class='m'>10</span>, <span class='o'>-</span><span class='m'>1</span>, <span class='m'>0</span>, <span class='m'>1</span>, <span class='m'>10</span>, <span class='m'>100</span><span class='o'>)</span>,</span> <span> guide <span class='o'>=</span> <span class='s'>"axis_logticks"</span></span> <span> <span class='o'>)</span></span> <span><span class='nv'>p</span></span> </code></pre> <p><img src="figs/log_axes-1.png" alt="A line plot showing a negatively sloped line with a reversed log10-transformation on the y-axis and inverse hyberbolic sine transformation on the x-axis. Large ticks appears at multiples of 10, medium ticks at multiples of 5 and small ticks at multiples of 1." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The log-ticks axis supersedes the earlier <a href="https://ggplot2.tidyverse.org/reference/annotation_logticks.html" target="_blank" rel="noopener"><code>annotation_logticks()</code></a> function. Because it is implemented as an axis, it has minimal fuss with the placement of labels and is immune to the clipping options in the coord. To mirror <a href="https://ggplot2.tidyverse.org/reference/annotation_logticks.html" target="_blank" rel="noopener"><code>annotation_logticks()</code></a> more closely, you can set a negative tick length in the theme.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.ticks.length <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>2.25</span>, <span class='s'>"pt"</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/log_ticks_inward-1.png" alt="The same plot as above, but the tick marks now point inwards." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="stacked-axes">Stacked axes <a href="#stacked-axes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The last new axis is technically not an axis, but a way to combine axes. <a href="https://ggplot2.tidyverse.org/reference/guide_axis_stack.html" target="_blank" rel="noopener"><code>guide_axis_stack()</code></a> can take multiple other axes and combine them by placing them next to one another. On its own, the usefulness of stacking axes is pretty limited. However, when extensions start defining custom position guides, it is an easy way to mix-and-match axes from different extensions. The first axis is placed next to the panel and subsequent axes are placed further away from the panel. Axes, like legends, have acquired a <code>theme</code> argument that can be used to customise the display of individual axes. Currently, there is not a compelling case to use <a href="https://ggplot2.tidyverse.org/reference/guide_axis_stack.html" target="_blank" rel="noopener"><code>guide_axis_stack()</code></a>, but it is an important building block for when axis extensions arrive.</p> <h2 id="display-in-facets">Display in facets <a href="#display-in-facets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>More of an indirect improvement to axes, is the ability of facets to tweak the appearance of inner axes when scales are fixed. This facilitates requirements in some journals that every panel should have labelled axes. <a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html" target="_blank" rel="noopener"><code>facet_wrap()</code></a> and <a href="https://ggplot2.tidyverse.org/reference/facet_grid.html" target="_blank" rel="noopener"><code>facet_grid()</code></a> would previously only display axes in between panels when <code>scales = &quot;free&quot;</code> was set. This is still the case, but there are more options available for <a href="https://ggplot2.tidyverse.org/reference/facet_grid.html" target="_blank" rel="noopener"><code>facet_grid()</code></a> and fixed scales. Using the <code>axes = &quot;all&quot;</code> option, all axes are displayed, including those in between panels. When using <code>axes = &quot;all_x&quot;</code> or <code>axes = &quot;all_y&quot;</code>, you can narrow down which axes are displayed.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_grid.html'>facet_grid</a></span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>~</span> <span class='nv'>drv</span>, axes <span class='o'>=</span> <span class='s'>"all_y"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/facet_axes_display-1.png" alt="A scatterplot facetted by the 'drv' and 'year' variables. The x-axes appear only at the bottom panels, whereas y-axes are displayed for every panel." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In addition, you can choose to selectively suppress labels and only show ticks marks by using the <code>axis.labels</code> argument.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_grid.html'>facet_grid</a></span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>~</span> <span class='nv'>drv</span>, axes <span class='o'>=</span> <span class='s'>"all"</span>, axis.labels <span class='o'>=</span> <span class='s'>"all_y"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/facet_axes_label_display-1.png" alt="A scatterplot facetted by the 'drv' and 'year' variables. The x-axes appear in full only at the bottom panels, and as tick marks in the first row of panels. The y-axes are displayed in full at every panel." width="700px" style="display: block; margin: auto;" /></p> </div> <p>That wraps up the visible changes to axes for this post. To read about general changes, see the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/">main post</a>. The changes to legends are covered in a <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-legends/">separate post</a> and for the new polar coordinate system (and their axes) will be in a future post.</p> Take the tidymodels survey for 2024 priorities https://www.tidyverse.org/blog/2024/02/tidymodels-2024-survey/ Wed, 28 Feb 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/02/tidymodels-2024-survey/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>At the end of 2021, we created a survey to get community input on how we prioritize our projects. <a href="https://colorado.posit.co/rsc/tidymodels-priorities-2022/" target="_blank" rel="noopener">The results</a> gave us a good sense of which items people were most interested in. Since then we have completed a number of projects:</p> <ul> <li><strong>Model fairness metrics</strong> were included in <a href="https://yardstick.tidymodels.org/news/index.html#yardstick-130" target="_blank" rel="noopener">yardstick 1.3.0</a> with <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels.org</a> posts coming soon.</li> <li><strong>Spatial analysis models and methods</strong> led to the creation of <a href="https://spatialsample.tidymodels.org/" target="_blank" rel="noopener">spatialsample</a>.</li> <li><strong>H2O.ai support</strong> was achieved with the creation of <a href="https://agua.tidymodels.org/" target="_blank" rel="noopener">agua</a>.</li> <li><strong>Better serialization tools</strong> are now provided in the <a href="https://github.com/rstudio/bundle" target="_blank" rel="noopener">bundle</a> package.</li> </ul> <p>Almost everything that respondents prioritized highly last year has either been completed or is currently in progress. Our main focus right now is to wrap up survival analysis, which is being done right now with a series of CRAN releases for the affected packages. Most immediately following these releases, we will be working on postprocessing and supervised feature selection. Beyond that, we&rsquo;d like to once again ask the community for feedback to help us better prioritize features in the coming year.</p> <h2 id="looking-toward-2024">Looking toward 2024 <a href="#looking-toward-2024"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><strong>Take a look at <a href="https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE" target="_blank" rel="noopener">our survey for next priorities</a></strong> and let us know what you think. There are some items we&rsquo;ve put &ldquo;on the menu&rdquo; but you can write in other items that you are interested in.</p> <p>The current slate of our possible priorities include:</p> <h3 id="sparse-tibbles">Sparse tibbles <a href="#sparse-tibbles"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Many models benefit from having sparse data, both in execution time and memory usage. We can&rsquo;t take full advantage of this since recipes use tibbles. This project would involve making it so the tibbles used <em>inside of a recipe</em> can hold sparse data. This would not be intended as a general substitute for regular tibbles.</p> <h3 id="causal-inference-interface">Causal inference interface <a href="#causal-inference-interface"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>While many common causal inference workflows are already possible with tidymodels, a small set of helper functions could greatly ease the experience of causal modeling in the framework. Specifically, these changes would better accommodate a two-stage modeling approach, using predictions from a propensity model to set case weights for an outcome model.</p> <h3 id="improve-chattr">Improve chattr <a href="#improve-chattr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://github.com/mlverse/chattr" target="_blank" rel="noopener">chattr</a> is an interface to large language models (LLMs). It enables interaction with the model directly from the RStudio IDE. This task would involve fine-tuning it to give better results when used for tidymodels tasks.</p> <h3 id="cost-sensitive-learning-api">Cost-sensitive learning API <a href="#cost-sensitive-learning-api"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>This feature is another solution for severe class imbalances. The main part of this task is making our approaches to this uniform across models.</p> <h3 id="expand-models-for-stacking-ensembles">Expand models for stacking ensembles <a href="#expand-models-for-stacking-ensembles"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As of now, the stacks package only supports combining the predictions of member models using a regularized linear model. We could extend the package to allow for combining predictions using any modeling <a href="https://workflows.tidymodels.org" target="_blank" rel="noopener">workflow</a>.</p> <h3 id="extend-support-for-spatial-ml">Extend support for spatial ML <a href="#extend-support-for-spatial-ml"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://spatialsample.tidymodels.org/" target="_blank" rel="noopener">spatialsample</a> introduced a number of spatial resampling methods to tidymodels. More comprehensive support for spatial ML would involve better integrating <a href="https://www.mm218.dev/posts/2022-08-11-waywiser-010-is-now-on-cran/" target="_blank" rel="noopener">spatial metrics</a> into the framework and introducing support for new spatial model types.</p> <h3 id="ordinal-regression-extension-package">Ordinal regression extension package <a href="#ordinal-regression-extension-package"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Ordinal regression models are specific to classification tasks with a natural ordering to the outcome categories (e.g., low, medium, high, etc.). We could add support for modeling this type of data in a parsnip extension package.</p> <p> <a href="https://conjoint.qualtrics.com/jfe/form/SV_aWw8ocGN5aPgeZE" target="_blank" rel="noopener">Check out our survey</a> and tell us what your priorities are!</p> ggplot2 3.5.0: Legends https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-legends/ Mon, 26 Feb 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0-legends/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We are pleased to release <a href="https://ggplot2.tidyverse.org" target="_blank" rel="noopener">ggplot2</a> 3.5.0. This is one blogpost among several outlining changes to legend guides. Please find the <a href="https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/">main release post</a> to read about other changes.</p> <p>Legends, alongside axes, are visual representations of scales and allow observes to translate graphical properties of a plot into information. To no surprise, legends in ggplot2 comprise the guides called <a href="https://ggplot2.tidyverse.org/reference/guide_legend.html" target="_blank" rel="noopener"><code>guide_legend()</code></a>, but also <a href="https://ggplot2.tidyverse.org/reference/guide_colourbar.html" target="_blank" rel="noopener"><code>guide_colourbar()</code></a>, <a href="https://ggplot2.tidyverse.org/reference/guide_coloursteps.html" target="_blank" rel="noopener"><code>guide_coloursteps()</code></a> and <a href="https://ggplot2.tidyverse.org/reference/guide_bins.html" target="_blank" rel="noopener"><code>guide_bins()</code></a>.</p> <h2 id="styling">Styling <a href="#styling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One of the more user-visible changes is that these guides no longer have styling options. Or at least, they have been soft-deprecated: they continue to work for now, but are scheduled for removal. Gone are the days where there were 4 possible ways to set the horizontal justification of legend text in 5 different functions. There is only one way to style guides now, and that is by using <a href="https://ggplot2.tidyverse.org/reference/theme.html" target="_blank" rel="noopener"><code>theme()</code></a>. The <a href="https://ggplot2.tidyverse.org/reference/theme.html" target="_blank" rel="noopener"><code>theme()</code></a> function has new arguments to control the appearance of legends, which makes it easier to globally control the appearance of legends. For example: <code>theme(legend.frame)</code> replaces <code>guide_colourbar(frame.colour, frame.linewidth, frame.linetype)</code> and <code>theme(legend.axis.line)</code> replaces <code>guide_bins(axis, axis.colour, axis.linewidth, axis.arrow)</code>. To allow for tweaking the style of any individual guide, the guide functions now have a <code>theme</code> argument that can accept a theme specific to that guide.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span>, shape <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span>, colour <span class='o'>=</span> <span class='nv'>cty</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='c'># Styling individual guides</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> shape <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span>theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.text <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> colour <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_colourbar.html'>guide_colorbar</a></span><span class='o'>(</span>theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.frame <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_rect</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='c'># Styling guides globally</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> legend.title.position <span class='o'>=</span> <span class='s'>"left"</span>,</span> <span> <span class='c'># Title justification is controlled by hjust/vjust in the element</span></span> <span> legend.title <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>90</span>, hjust <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/guide_theming-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The legend indicating shapes for the number of cylinders has red text. The colour bar indicating city miles per gallon has a red rectangle around the bar. Both the legend and colour bar titles are rotated, centered and on the left of the guide." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In the plot above, notice how the legend title settings affect both the colour bar and the legend, whereas the local options, like red legend text, only apply to a single guide.</p> <h2 id="awareness">Awareness <a href="#awareness"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Legends are now more aware what discrete variables should be placed in which keys. By default, they now only draw keys for the layer which contain the relevant value. This saves one having to hassle with the <code>guide_legend(override.aes)</code> argument to get the keys to display just right. In the plot below, notice how the points and line have separate keys.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_manual.html'>scale_alpha_manual</a></span><span class='o'>(</span>values <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>p</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"points"</span>, alpha <span class='o'>=</span> <span class='s'>"points"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"line"</span>, alpha <span class='o'>=</span> <span class='s'>"line"</span><span class='o'>)</span>,</span> <span> stat <span class='o'>=</span> <span class='s'>"smooth"</span>, formula <span class='o'>=</span> <span class='nv'>y</span> <span class='o'>~</span> <span class='nv'>x</span>, method <span class='o'>=</span> <span class='s'>"lm"</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/legend_awareness-1.png" alt="A scatterplot with trendline showing engine displacement versus highway miles per gallon. There are two legends for colour and alpha. Both legends show points and lines separately." width="700px" style="display: block; margin: auto;" /></p> </div> <p>To revert back to the old behaviour, you can set the <code>show.legend = TRUE</code> option in the layers. Like before, the <code>show.legend</code> argument can still be set in an aesthetic-specific way. Setting it to <code>TRUE</code> means &lsquo;always show&rsquo;, <code>FALSE</code> means &lsquo;never show&rsquo; and <code>NA</code> means &lsquo;show if found&rsquo;.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"points"</span>, alpha <span class='o'>=</span> <span class='s'>"points"</span><span class='o'>)</span>,</span> <span> show.legend <span class='o'>=</span> <span class='kc'>TRUE</span> <span class='c'># always show</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"line"</span>, alpha <span class='o'>=</span> <span class='s'>"line"</span><span class='o'>)</span>,</span> <span> stat <span class='o'>=</span> <span class='s'>"smooth"</span>, formula <span class='o'>=</span> <span class='nv'>y</span> <span class='o'>~</span> <span class='nv'>x</span>, method <span class='o'>=</span> <span class='s'>"lm"</span>,</span> <span> show.legend <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='kc'>NA</span>, alpha <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='c'># always show in alpha</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/show_key_setting-1.png" alt="The same plot as before, but every legend keys displays points. Lines are shown in every 'alpha' legend key, but only one 'colour' key." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="placement">Placement <a href="#placement"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Legend positions are no longer restricted to just a single side of the plot. By setting the <code>position</code> argument of guides, you can tailor which guides appear where in the plot. Guides that do not have a position set, like the &lsquo;drv&rsquo; shape legend below, follow the global theme&rsquo;s <code>legend.position</code> setting. If we suspend our belief in good data visualisation practice, we can showcase this as follows:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span>, shape <span class='o'>=</span> <span class='nv'>drv</span>, colour <span class='o'>=</span> <span class='nv'>cty</span>, size <span class='o'>=</span> <span class='nv'>year</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>alpha <span class='o'>=</span> <span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> colour <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_colourbar.html'>guide_colourbar</a></span><span class='o'>(</span>position <span class='o'>=</span> <span class='s'>"bottom"</span><span class='o'>)</span>,</span> <span> size <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span>position <span class='o'>=</span> <span class='s'>"top"</span><span class='o'>)</span>,</span> <span> alpha <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span>position <span class='o'>=</span> <span class='s'>"inside"</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.position <span class='o'>=</span> <span class='s'>"left"</span><span class='o'>)</span></span> <span><span class='nv'>p</span></span> </code></pre> <p><img src="figs/legend_positions-1.png" alt="A scatterplot showing engine displacement versus highway miles per gallon. It has four legend placed at the top, left, bottom of the panel and one inside the panel." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In the plot above, the legend for the &lsquo;cyl&rsquo; variable is in the middle of the plot. In previous versions of ggplot2, you could set the <code>legend.position</code> to a coordinate to control the placement. However, doing this would change the default legend position, which is not always desirable. To cover such cases, there is now a specialised <code>legend.position.inside</code> argument that controls the positioning of legends with <code>position = &quot;inside&quot;</code> regardless of whether the position was specified in the theme or in the guide.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.position.inside <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.7</span>, <span class='m'>0.7</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/legend_inside-1.png" alt="The same plot as before, but the legend for the 'cyl' variable is to the top-right of the centre." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The justification of legends is controllable by using the <code>legend.justification.{position}</code> theme setting. Moreover, the top and bottom guides can be aligned to the plot rather than the panel by setting the <code>legend.location</code> argument. The main reason behind this is that you can then align the legends with the plot&rsquo;s title. By default, when <code>plot.title.position = &quot;plot&quot;</code>, left legends are already aligned. For this reason, the top and bottom guides are prioritised for the <code>legend.location</code> setting. Moreover, it avoids overlapping of legends in the corners if the justifications would dictate it.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"Plot-aligned title"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> legend.margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>0</span>, <span class='m'>0</span>, <span class='m'>0</span><span class='o'>)</span>, <span class='c'># turned off for alignment</span></span> <span> legend.justification.top <span class='o'>=</span> <span class='s'>"left"</span>,</span> <span> legend.justification.left <span class='o'>=</span> <span class='s'>"top"</span>,</span> <span> legend.justification.bottom <span class='o'>=</span> <span class='s'>"right"</span>,</span> <span> legend.justification.inside <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span><span class='o'>)</span>,</span> <span> legend.location <span class='o'>=</span> <span class='s'>"plot"</span>,</span> <span> plot.title.position <span class='o'>=</span> <span class='s'>"plot"</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/legend_alignments-1.png" alt="The same plot as before, but with a plot-aligned title and different alignments of the legends. The left and top legends are left-aligned with the title." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="spacing-and-margins">Spacing and margins <a href="#spacing-and-margins"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In this release, the way spacing in legends work has been reworked.</p> <ul> <li>The <code>legend.spacing{.x/.y}</code> theme setting is now used to space different guides apart. Previously, it was also used to space legend keys apart; that is no longer the case.</li> <li>Spacing legend key-label pairs apart is now controlled by the <code>legend.key.spacing{.x/.y}</code> theme setting.</li> <li>Spacing the labels from the keys is now controlled by the label element&rsquo;s <code>margin</code> argument.</li> </ul> <p>Because the legend spacing and margin options can be a bit bewildering, a small overview is added below. One setting not included in the overview is <code>legend.spacing.x</code>, which only applies when <code>legend.box = &quot;horizontal&quot;</code>. Which exact text margin is relevant for spacing apart keys and labels, or titles and the rest of the guide, depends on the <code>legend.text.position</code> and <code>legend.title.position</code> theme elements.</p> <div class="highlight"> <p><img src="figs/spacing_overview-1.png" alt="Overview of legend spacing and margin options. Two abstract legends are placed above one another to the right of an area called 'plot'. Various arrows with labels point out different theme settings." width="700px" style="display: block; margin: auto;" /></p> </div> <p>When the titles and keys don&rsquo;t have explicit margins, appropriate margins are added automatically depending on the text or title position. However, if you override the margins, they will be interpreted literally.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span>, colour <span class='o'>=</span> <span class='nv'>class</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span>ncol <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> legend.key.spacing.x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='s'>"pt"</span><span class='o'>)</span>,</span> <span> legend.key.spacing.y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>20</span>, <span class='s'>"pt"</span><span class='o'>)</span>,</span> <span> legend.text <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span>l <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> legend.title <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span>b <span class='o'>=</span> <span class='m'>20</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/legend_spacing-1.png" alt="A scatterplot showing engine displacement versus highway miles per gallon. The legend for the 'class' variable shows a key layout with two columns. Keys are widely spacing in the vertical direction and more narrowly in the horizontal direction. There is no space between the keys and their labels, but plenty of space between the legend and its title." width="700px" style="display: block; margin: auto;" /></p> </div> <p>For all intents and purposes, colour bar/step and bins guides are treated as legend guides with just a single key-label pair. While the <code>legend.key.spacing</code> setting does not apply due to it being one single key, the other spacings and margins do apply equally.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span>, colour <span class='o'>=</span> <span class='nv'>cty</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> legend.text <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span>l <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> legend.title <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>margin <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span>b <span class='o'>=</span> <span class='m'>20</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/legend_spacing_bar-1.png" alt="The same plot as before, but with a colourbar indicating the 'cty' variable. Again, there is no space between the bar and the labels and ample space between the bar and the title." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="stretching">Stretching <a href="#stretching"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another experimental tweak to legends is that they can now have stretching keys (or bars). The option is still considered &lsquo;experimental&rsquo; because there are some things that may go wrong. By setting the <code>legend.key{.height/.width}</code> theme argument as a <code>&quot;null&quot;</code> unit, legends can now expand to fill the available space.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nv'>cty</span>, size <span class='o'>=</span> <span class='nv'>cyl</span><span class='o'>)</span>, shape <span class='o'>=</span> <span class='m'>21</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.key.height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='s'>"null"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>p</span></span> </code></pre> <p><img src="figs/stretch_keys-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. There is a legend guide showing the point's size and a colour. Both the legend and the bar take up an approximately equal amount of space on the right-hand side of the panel." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The term &lsquo;available space&rsquo; is a tricky one. For starters, other legends placed in the same position take up space, as can be seen in the plot above. If your legend is the only legend in a position, more space is available and it stretches more. As you can see in the plot below, the legends are not aligned with the panel even when stretched. This is because the titles, margins and various spacings all take up space that is <em>not</em> available to stretch into.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_colourbar.html'>guide_colourbar</a></span><span class='o'>(</span>position <span class='o'>=</span> <span class='s'>"left"</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/isolated_stretch-1.png" alt="Same plot as before, but the colour bar is placed on the left. Both the colour bar and legend take up a lot of vertical space." width="700px" style="display: block; margin: auto;" /></p> </div> <p>On the other hand, if one position is packed with legends, the keys may shrink instead of stretch. The keys can become too small to show the aesthetics properly. You can see in the example below that the size legend becomes cut-off due to small keys and text is spaced too closely to comfortably read.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>model</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/shrinking_keys-1.png" alt="Same plot as before, but all legends are on the right, including a new legend for the 'model' variable. All legends have keys that are too small to read the text comfortably, and the points indicating size are clipped." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Another issue that may come up is that the &lsquo;available space&rsquo; might be 0. Because the plot itself is also space-filling, setting null-heights for top/bottom positions or null-widths for left/right positions means there is no available space. This may result in the keys or bars becoming invisible. For the plot below, recall that we&rsquo;ve set the <code>legend.key.height</code> setting to a null unit.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.position <span class='o'>=</span> <span class='s'>"top"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/disappearing_keys-1.png" alt="Still the same scatterplot but without the fill variable. Legends are placed at the top of the panel, but the bar and key backgrounds have disappeared. The text labels are still present." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="other-improvements">Other improvements <a href="#other-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We welcome a new type of legend: <a href="https://ggplot2.tidyverse.org/reference/guide_custom.html" target="_blank" rel="noopener"><code>guide_custom()</code></a>. It can be used to add any graphical object (grob) to a plot, like <a href="https://ggplot2.tidyverse.org/reference/annotation_custom.html" target="_blank" rel="noopener"><code>annotation_custom()</code></a>. There are a few differences though: it is positioned just like a legend and adds titles and margins. In some sense, this guide is &lsquo;special&rsquo;, as it is the only guide that does not directly reflect a scale. The downside is that it cannot read properties from the plot, but the upside is that it is very flexible. Be careful when your grob does not have an absolute size, you should set the <code>width</code> and <code>height</code> arguments.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>1</span>, <span class='m'>1.5</span>, <span class='m'>1.2</span>, <span class='m'>1.5</span>, <span class='m'>1</span>, <span class='m'>0.5</span>, <span class='m'>0.8</span>, <span class='m'>1</span>, <span class='m'>1.15</span>, <span class='m'>2</span>, <span class='m'>1.15</span>, <span class='m'>1</span>, <span class='m'>0.85</span>, <span class='m'>0</span>, <span class='m'>0.85</span><span class='o'>)</span></span> <span><span class='nv'>y</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1.5</span>, <span class='m'>1.2</span>, <span class='m'>1.5</span>, <span class='m'>1</span>, <span class='m'>0.5</span>, <span class='m'>0.8</span>, <span class='m'>0.5</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>1.15</span>, <span class='m'>1</span>, <span class='m'>0.85</span>, <span class='m'>0</span>, <span class='m'>0.85</span>, <span class='m'>1</span>, <span class='m'>1.15</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>compass_rose</span> <span class='o'>&lt;-</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.polygon.html'>polygonGrob</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"cm"</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='nv'>y</span>, <span class='s'>"cm"</span><span class='o'>)</span>, id.lengths <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>8</span>, <span class='m'>8</span><span class='o'>)</span>,</span> <span> gp <span class='o'>=</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"grey50"</span>, <span class='s'>"grey25"</span><span class='o'>)</span>, col <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>nc</span> <span class='o'>&lt;-</span> <span class='nf'>sf</span><span class='nf'>::</span><span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_read.html'>st_read</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"shape/nc.shp"</span>, package <span class='o'>=</span> <span class='s'>"sf"</span><span class='o'>)</span>, quiet <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>nc</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggsf.html'>geom_sf</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>AREA</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span>custom <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_custom.html'>guide_custom</a></span><span class='o'>(</span><span class='nv'>compass_rose</span>, title <span class='o'>=</span> <span class='s'>"compass"</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/custom_guide-1.png" alt="A map of the US state North Carolina, where fill colour indicates the area of counties. Underneath the colour bar for the fill, there is an eight-pointed star to the right of the panel with the title 'compass'." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In previous version of ggplot2, when legend titles are wider than the legends, the guide-title alignment was always left aligned. Now, the justification setting of the legend text determines the alignment: 1 is right or top aligned and 0 is left or bottom aligned.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span>, shape <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span>, colour <span class='o'>=</span> <span class='nv'>drv</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guides.html'>guides</a></span><span class='o'>(</span></span> <span> shape <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span></span> <span> title <span class='o'>=</span> <span class='s'>"A title that is pretty long"</span>,</span> <span> theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.title <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>hjust <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> order <span class='o'>=</span> <span class='m'>1</span></span> <span> <span class='o'>)</span>,</span> <span> colour <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/guide_legend.html'>guide_legend</a></span><span class='o'>(</span></span> <span> title <span class='o'>=</span> <span class='s'>"Another long title"</span>,</span> <span> theme <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.title <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>hjust <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/title_justification-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The 'drv' variable has a legend that is left aligned, whereas the 'cyl' variable has a legend that is right-aligned." width="700px" style="display: block; margin: auto;" /></p> </div> ggplot2 3.5.0 https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/ Fri, 23 Feb 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/02/ggplot2-3-5-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the release of <a href="https://ggplot2.tidyverse.org" target="_blank" rel="noopener">ggplot2</a> 3.5.0. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"ggplot2"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will cover a bunch of new features included in the latest release. In addition to rewriting the guide system, we made progress supporting newer R graphics capabilities, re-purposed the use of <a href="https://rdrr.io/r/base/AsIs.html" target="_blank" rel="noopener"><code>I()</code></a>, and introduce an improved polar coordinate system, along with other improvements. As the release is quite large, we are making a <a href="https://www.tidyverse.org/tags/ggplot2-3-5-0/" target="_blank" rel="noopener">series of blog posts</a> covering the major changes.</p> <p>You can see a full list of changes in the <a href="https://ggplot2.tidyverse.org/news/index.html" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://patchwork.data-imaginist.com'>patchwork</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'>grid</span><span class='o'>)</span></span></code></pre> </div> <h2 id="guide-rewrite">Guide rewrite <a href="#guide-rewrite"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Axes and legends, collectively called guides, are an important component to plots, as they allow the translation of visual information back to data qualities. The extension mechanism of ggplot2 allows others to develop their own layers, facets, coords and scales through the ggproto object-oriented system. Finally, after years of being the only major system in ggplot2 still clinging to the S3 system, guides have been rewritten to use ggproto. With this rewrite, guides officially become an extension point that let developers implement their own guides. We have added a section to the <a href="https://ggplot2.tidyverse.org/articles/extending-ggplot2.html#creating-new-guides" target="_blank" rel="noopener">Extending ggplot2</a> vignette on how to develop a new guide.</p> <p>Alongside the rewrite, we made a slew of improvements to guides along the way. As these are somewhat meaty and focused topics, we are going to cover them in separate blog posts about axes and legends.</p> <h2 id="patterns-and-gradients">Patterns and gradients <a href="#patterns-and-gradients"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Patterns and gradients are provided by the grid package, which ggplot2 builds on top of. They were first introduced in R 4.1.0 and were refined in R 4.2.0 to support multiple patterns and gradients. If your graphics device supported it, theme elements could already be set to patterns or gradients, even before this release.</p> <blockquote> <p>Note: On Windows machines, the default device in RStudio and in the knitr package is <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>png()</code></a>, which does not support patterns. In RStudio, you can go to &lsquo;Tools &gt; Global Options &gt; General &gt; Graphics&rsquo; and choose the &lsquo;ragg&rsquo; or &lsquo;Cairo PNG&rsquo; device from the dropdown menu to display patterns.</p> </blockquote> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>gray_gradient</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/pal_grey.html'>pal_grey</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>panel.background <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_rect</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>gray_gradient</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/theme_gradient-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The panel background is a colour gradient starting from dark grey in the bottom-left corner ending at light grey in the upper-right corner." width="700px" style="display: block; margin: auto;" /></p> </div> <p>We are pleased to report that as of this release, patterns can be used as the <code>fill</code> aesthetic in most layers. To use a pattern, first build a gradient using {grid}&lsquo;s <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>linearGradient()</code></a>, <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>radialGradient()</code></a> functions, or a pattern using the <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>pattern()</code></a> function. Because handling patterns and gradients is very similar, we will treat gradients as if they were patterns: when we say &lsquo;pattern&rsquo; in the text below, please mind that we mean patterns and gradients alike. These patterns can be passed to a layer as the <code>fill</code> aesthetic. Below, you can see two behaviours of the <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>linearGradient()</code></a> pattern, depending on its <code>group</code> argument. The pattern with <code>group = FALSE</code> will display the gradient in every rectangle and <code>group = TRUE</code> will apply the gradient to all rectangles together.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>colours</span> <span class='o'>&lt;-</span> <span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/pal_viridis.html'>viridis_pal</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span></span> <span><span class='nv'>grad_ungroup</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nv'>colours</span>, group <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='nv'>grad_grouped</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nv'>colours</span>, group <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>ungroup</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>grad_ungroup</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"Ungrouped gradient"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>grouped</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>grad_grouped</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"Grouped gradient"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>ungroup</span> <span class='o'>|</span> <span class='nv'>grouped</span></span> </code></pre> <p><img src="figs/grouping_gradient-1.png" alt="Two barplots showing the counts of number of cylinders. The first plot is titled 'Ungrouped gradient' and shows individual gradients in the bars. The second is titled 'Grouped gradient' and shows a single gradient along all bars." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Besides passing a static pattern as the <code>fill</code> aesthetic, it is also possible to map values to patterns using <a href="https://ggplot2.tidyverse.org/reference/scale_manual.html" target="_blank" rel="noopener"><code>scale_fill_manual()</code></a>. To map values to patterns, pass a list of patterns to the <code>values</code> argument of the scale. When providing patterns as a list, the list can be a mix of patterns and plain colours, like <code>&quot;limegreen&quot;</code> in the plot below. We are excited that people may come up with nice pattern palettes that can be used in similar fashion.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>patterns</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nv'>colours</span>, group <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span>,</span> <span> <span class='s'>"limegreen"</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>radialGradient</a></span><span class='o'>(</span><span class='nv'>colours</span>, group <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>pattern</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>rectGrob</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.25</span>, <span class='m'>0.75</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.25</span>, <span class='m'>0.75</span><span class='o'>)</span>, width <span class='o'>=</span> <span class='m'>0.5</span>, height <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span>,</span> <span> width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='s'>"mm"</span><span class='o'>)</span>, height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='s'>"mm"</span><span class='o'>)</span>, extend <span class='o'>=</span> <span class='s'>"repeat"</span>,</span> <span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='s'>"limegreen"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_manual.html'>scale_fill_manual</a></span><span class='o'>(</span>values <span class='o'>=</span> <span class='nv'>patterns</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/pattern_scale-1.png" alt="Barplot showing counts of number of cylinders with the bars filled by a linear gradient, a plain green colour, a radial gradient and a green checkerboard pattern." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The largest obstacle we had to overcome to support gradients in ggplot2 was to apply the <code>alpha</code> aesthetic consistently to the patterns. The regular <a href="https://scales.r-lib.org/reference/alpha.html" target="_blank" rel="noopener"><code>scales::alpha()</code></a> function does not work with patterns, so we implemented a new <a href="https://ggplot2.tidyverse.org/reference/fill_alpha.html" target="_blank" rel="noopener"><code>fill_alpha()</code></a> function that applies the <code>alpha</code> aesthetic to the patterns. By switching out <code>fill = alpha(fill, alpha)</code> with <code>fill = fill_alpha(fill, alpha)</code> in the <a href="https://rdrr.io/r/grid/gpar.html" target="_blank" rel="noopener"><code>grid::gpar()</code></a> function, extension developers can enable pattern fills in their own layer extensions.</p> <p>The <a href="https://ggplot2.tidyverse.org/reference/fill_alpha.html" target="_blank" rel="noopener"><code>fill_alpha()</code></a> function checks if the active device supports patterns and spits out a friendlier warning or error on demand. For extension developers that want to use newer graphics features, you can reuse the <a href="https://ggplot2.tidyverse.org/reference/check_device.html" target="_blank" rel="noopener"><code>check_device()</code></a> function to check feature availability or throw messages in a similar fashion.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># The currently active device is the ragg::agg_png() device</span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/check_device.html'>check_device</a></span><span class='o'>(</span>feature <span class='o'>=</span> <span class='s'>"patterns"</span>, action <span class='o'>=</span> <span class='s'>"test"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/check_device.html'>check_device</a></span><span class='o'>(</span>feature <span class='o'>=</span> <span class='s'>"glyphs"</span>, action <span class='o'>=</span> <span class='s'>"abort"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> The <span style='color: #00BB00;'>agg_png</span> device does not support <span style='font-style: italic;'>typeset glyphs</span>.</span></span> <span></span></code></pre> </div> <h2 id="ignoring-scales">Ignoring scales <a href="#ignoring-scales"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In this release, ggplot2 has changed how the plots interact with variables created with <a href="https://rdrr.io/r/base/AsIs.html" target="_blank" rel="noopener"><code>I()</code></a> (&lsquo;AsIs&rsquo; variables). The change is somewhat subtle, so it takes a bit of explaining.</p> <p>It <em>used to be</em> the case that &lsquo;AsIs&rsquo; variables automatically added an identity scale to the plot. Identity scales in ggplot2 preserve the original input, without mapping or transforming them. For example, iif you give literal colour names as the <code>colour</code> aesthetic, the plot will use these exact colours.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>42</span><span class='o'>)</span></span> <span><span class='nv'>my_colours</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"red"</span>, <span class='s'>"green"</span>, <span class='s'>"blue"</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>mpg</span><span class='o'>)</span>, replace <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nv'>my_colours</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_identity.html'>scale_colour_identity</a></span><span class='o'>(</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/literal_colours-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon with points in red, green and blue." width="700px" style="display: block; margin: auto;" /></p> </div> <p>However, because identity scales <em>are</em> true scales, you cannot combine literal colours in one layer with mapped colours in the next. Trying to do so, will confront you with the &lsquo;unknown colour name&rsquo; error.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nv'>drv</span><span class='o'>)</span>, shape <span class='o'>=</span> <span class='m'>1</span>, size <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nv'>my_colours</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_identity.html'>scale_colour_identity</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `geom_point()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while converting geom to grob.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Error occurred in the 1st layer.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Unknown colour name: f</span></span> <span></span></code></pre> </div> <p>In order to prevent such clashes between identity scales that map nothing and regular scales, we have changed how &lsquo;AsIs&rsquo; variables interact with scales. Instead of adding an identity scale, &lsquo;AsIs&rsquo; variables are now altogether <em>ignored</em> by the scale systems. On the surface, the new behaviour is very similar to the old one, in that for example literal colours are used. However, with &lsquo;AsIs&rsquo; variables ignored, you can now freely combine layers with &lsquo;AsIs&rsquo; input with layers that map input. If you need a legend for the literal variable, we recommend to use the identity scale mechanism instead.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nv'>drv</span><span class='o'>)</span>, shape <span class='o'>=</span> <span class='m'>1</span>, size <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='nv'>my_colours</span><span class='o'>)</span><span class='o'>)</span>, show.legend <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/asis_aesthetic-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. Every point has two circles: a smaller one in red, green or blue and a larger one mapped to the 'drv' variable." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Perhaps more salient than avoid scale clashes, is that the same applies to the <code>x</code> and <code>y</code> position aesthetics. There has never been a <code>scale_x_identity()</code> or <code>scale_y_identity()</code> function, so what this means may be unexpected. Internally, scales transform every continuous variable to the 0-1 range before drawing the graphics. So too do &lsquo;AsIs&rsquo; position aesthetics work: you can use numbers between 0 and 1 to set the position. These positions are relative to the plot&rsquo;s panel and this mechanism opens up a great way to add plot annotations that are independent of the data.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>t</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>2</span> <span class='o'>*</span> <span class='nv'>pi</span>, length.out <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>displ</span>, <span class='nv'>hwy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"grey50"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/annotate.html'>annotate</a></span><span class='o'>(</span></span> <span> <span class='s'>"rect"</span>, </span> <span> xmin <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.05</span><span class='o'>)</span>, xmax <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.95</span><span class='o'>)</span>,</span> <span> ymin <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.05</span><span class='o'>)</span>, ymax <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.95</span><span class='o'>)</span>,</span> <span> fill <span class='o'>=</span> <span class='kc'>NA</span>, colour <span class='o'>=</span> <span class='s'>"red"</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/annotate.html'>annotate</a></span><span class='o'>(</span></span> <span> <span class='s'>"path"</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>cos</a></span><span class='o'>(</span><span class='nv'>t</span><span class='o'>)</span> <span class='o'>/</span> <span class='m'>2</span> <span class='o'>+</span> <span class='m'>0.5</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>sin</a></span><span class='o'>(</span><span class='nv'>t</span><span class='o'>)</span> <span class='o'>/</span> <span class='m'>2</span> <span class='o'>+</span> <span class='m'>0.5</span><span class='o'>)</span>,</span> <span> colour <span class='o'>=</span> <span class='s'>"blue"</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/annotate.html'>annotate</a></span><span class='o'>(</span></span> <span> <span class='s'>"text"</span>, </span> <span> label <span class='o'>=</span> <span class='s'>"Text in the middle"</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.5</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/AsIs.html'>I</a></span><span class='o'>(</span><span class='m'>0.5</span><span class='o'>)</span>,</span> <span> size <span class='o'>=</span> <span class='m'>8</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/asis_annotation-1.png" alt="Scatterplot of engine displacement versus highway miles per gallon. The plot has a red rectangle slightly smaller than the panel, a blue circle touching the panel edges and text in the middle that reads: 'text in the middle'." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Please take note that discrete variables as &lsquo;AsIs&rsquo; position aesthetic have no interpretation and will likely result in errors.</p> <h2 id="other-improvements">Other improvements <a href="#other-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Coordinating text sizes between the theme and <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_text()</code></a>/ <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_label()</code></a> has been a hassle, since the theme uses text sizes in points (pt) and geoms use text size in millimetres. Now, one can control what the <code>size</code> aesthetic means for text, by setting the <code>size.unit</code> argument.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>wt</span>, <span class='nv'>mpg</span>, label <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>rownames</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>p</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_text</a></span><span class='o'>(</span>size <span class='o'>=</span> <span class='m'>10</span>, size.unit <span class='o'>=</span> <span class='s'>"pt"</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>axis.text <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_text</a></span><span class='o'>(</span>size <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/size_unit_arg-1.png" alt="A plot showing weight versus miles per gallon with individual cars labelled by text. The text in the plot has the same size as the text labelling the axes." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Two improvements have been made to <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_label()</code></a>. The first is that it now obeys an <code>angle</code> aesthetic.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_label</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span>, <span class='o'>-</span><span class='m'>45</span>, <span class='m'>45</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/label_angle-1.png" alt="A plot showing weight versus miles per gallon with individual cars labelled by textboxes. The textboxes are displayed in different angles." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In addition, <a href="https://ggplot2.tidyverse.org/reference/geom_text.html" target="_blank" rel="noopener"><code>geom_label()</code></a>&lsquo;s <code>label.padding</code> argument can be controlled individually for every side of the text by using the <a href="https://ggplot2.tidyverse.org/reference/element.html" target="_blank" rel="noopener"><code>margin()</code></a> function. The legend keys for labels has also changed to reflect the geom more accurately.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_label</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span><span class='o'>)</span>, </span> <span> label.padding <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>margin</a></span><span class='o'>(</span>t <span class='o'>=</span> <span class='m'>2</span>, r <span class='o'>=</span> <span class='m'>20</span>, b <span class='o'>=</span> <span class='m'>1</span>, l <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> </code></pre> <p><img src="figs/label_padding-1.png" alt="A plot showing weight versus miles per gallon with individual cars labelled by textboxes. The textboxes have a large margin on the right." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Like <a href="https://ggplot2.tidyverse.org/reference/geom_density.html" target="_blank" rel="noopener"><code>geom_density()</code></a> before it, <a href="https://ggplot2.tidyverse.org/reference/geom_violin.html" target="_blank" rel="noopener"><code>geom_violin()</code></a> now gains a <code>bounds</code> argument to restrict the range wherein density is estimated.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/stats/Beta.html'>rbeta</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='m'>0.5</span>, <span class='m'>0.5</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/stats/Beta.html'>rbeta</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='m'>1</span>, <span class='m'>1</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/stats/Beta.html'>rbeta</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='m'>2</span>, <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> group <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"C"</span><span class='o'>)</span>, each <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>group</span>, <span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_violin.html'>geom_violin</a></span><span class='o'>(</span>bounds <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/violin_bounds-1.png" alt="Violin plot showing random numbers drawn from beta distributions with different parameters. The ends of the first two violins are flat at the top and bottom." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The <a href="https://ggplot2.tidyverse.org/reference/geom_boxplot.html" target="_blank" rel="noopener"><code>geom_boxplot()</code></a> has acquired an option to remove (rather than hide) outliers. Setting <code>outliers = FALSE</code> removes outliers so that the plot limits do not take these into account. For hiding (and not removing) outliers, you can still set <code>outlier.shape = NA</code>. Also, it has gained a <code>staplewidth</code> argument that can be used to draw staples: horizontal lines at the end of the boxplot whiskers. The default, <code>staplewidth = 0</code>, will suppress the staples so your current box plots continue to look the same.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>diamonds</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>cut</span>, <span class='nv'>price</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span>outliers <span class='o'>=</span> <span class='kc'>FALSE</span>, staplewidth <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/boxplot_outliers_staples-1.png" alt="Boxplot showing the price of diamonds per cut. The y-axis does not go much beyond the whiskers, and whiskers are decorated with a staple." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The scales functions now do a better job at reporting <em>which</em> scale has encountered an error.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_brewer.html'>scale_colour_brewer</a></span><span class='o'>(</span>breaks <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>5</span>, labels <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>4</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `scale_colour_brewer()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `breaks` and `labels` must have the same length.</span></span> <span></span><span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>class</span>, <span class='nv'>displ</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_boxplot.html'>geom_boxplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `scale_x_continuous()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Discrete values supplied to continuous scale.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Example values: <span style='color: #0000BB;'>"compact"</span>, <span style='color: #0000BB;'>"compact"</span>, <span style='color: #0000BB;'>"compact"</span>, <span style='color: #0000BB;'>"compact"</span>, and <span style='color: #0000BB;'>"compact"</span></span></span> <span></span><span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>msleep</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>bodywt</span> <span class='o'>-</span> <span class='m'>1</span>, <span class='nv'>brainwt</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span>na.rm <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_log10</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in transformation$transform(x): NaNs produced</span></span> <span></span><span><span class='c'>#&gt; Warning in scale_x_log10(): <span style='color: #00BB00;'>log-10</span> transformation introduced infinite values.</span></span> <span></span></code></pre> <p><img src="figs/scale_messages-1.png" alt="Scatterplot showing body weight minus one versus brain weight of mammals. The x-axis is log-transformed." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you to all people who have contributed issues, code and comments to this release:</p> <p> <a href="https://github.com/92amartins" target="_blank" rel="noopener">@92amartins</a>, <a href="https://github.com/a-torgovitsky" target="_blank" rel="noopener">@a-torgovitsky</a>, <a href="https://github.com/aarongraybill" target="_blank" rel="noopener">@aarongraybill</a>, <a href="https://github.com/aavogt" target="_blank" rel="noopener">@aavogt</a>, <a href="https://github.com/agila5" target="_blank" rel="noopener">@agila5</a>, <a href="https://github.com/ahcyip" target="_blank" rel="noopener">@ahcyip</a>, <a href="https://github.com/AlexanderCasper" target="_blank" rel="noopener">@AlexanderCasper</a>, <a href="https://github.com/alexkrohn" target="_blank" rel="noopener">@alexkrohn</a>, <a href="https://github.com/alofting" target="_blank" rel="noopener">@alofting</a>, <a href="https://github.com/andrewgustar" target="_blank" rel="noopener">@andrewgustar</a>, <a href="https://github.com/antagomir" target="_blank" rel="noopener">@antagomir</a>, <a href="https://github.com/aphalo" target="_blank" rel="noopener">@aphalo</a>, <a href="https://github.com/Ari04T" target="_blank" rel="noopener">@Ari04T</a>, <a href="https://github.com/AroneyS" target="_blank" rel="noopener">@AroneyS</a>, <a href="https://github.com/Asa12138" target="_blank" rel="noopener">@Asa12138</a>, <a href="https://github.com/ashgreat" target="_blank" rel="noopener">@ashgreat</a>, <a href="https://github.com/averissimo" target="_blank" rel="noopener">@averissimo</a>, <a href="https://github.com/bakerwm" target="_blank" rel="noopener">@bakerwm</a>, <a href="https://github.com/balling-dev" target="_blank" rel="noopener">@balling-dev</a>, <a href="https://github.com/banbh" target="_blank" rel="noopener">@banbh</a>, <a href="https://github.com/barracuda156" target="_blank" rel="noopener">@barracuda156</a>, <a href="https://github.com/BartJanvanRossum" target="_blank" rel="noopener">@BartJanvanRossum</a>, <a href="https://github.com/beansrowning" target="_blank" rel="noopener">@beansrowning</a>, <a href="https://github.com/benimwolfspelz" target="_blank" rel="noopener">@benimwolfspelz</a>, <a href="https://github.com/bfordAIMS" target="_blank" rel="noopener">@bfordAIMS</a>, <a href="https://github.com/bguiastr" target="_blank" rel="noopener">@bguiastr</a>, <a href="https://github.com/bnicenboim" target="_blank" rel="noopener">@bnicenboim</a>, <a href="https://github.com/BrianDiggs" target="_blank" rel="noopener">@BrianDiggs</a>, <a href="https://github.com/bsgerber" target="_blank" rel="noopener">@bsgerber</a>, <a href="https://github.com/burrapreeti" target="_blank" rel="noopener">@burrapreeti</a>, <a href="https://github.com/bwiernik" target="_blank" rel="noopener">@bwiernik</a>, <a href="https://github.com/ccsarapas" target="_blank" rel="noopener">@ccsarapas</a>, <a href="https://github.com/CGlemser" target="_blank" rel="noopener">@CGlemser</a>, <a href="https://github.com/chiajungTung" target="_blank" rel="noopener">@chiajungTung</a>, <a href="https://github.com/chipsin87" target="_blank" rel="noopener">@chipsin87</a>, <a href="https://github.com/cjvanlissa" target="_blank" rel="noopener">@cjvanlissa</a>, <a href="https://github.com/CorradoLanera" target="_blank" rel="noopener">@CorradoLanera</a>, <a href="https://github.com/danielneilson" target="_blank" rel="noopener">@danielneilson</a>, <a href="https://github.com/danli349" target="_blank" rel="noopener">@danli349</a>, <a href="https://github.com/DasHammett" target="_blank" rel="noopener">@DasHammett</a>, <a href="https://github.com/davidhodge931" target="_blank" rel="noopener">@davidhodge931</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dieghernan" target="_blank" rel="noopener">@dieghernan</a>, <a href="https://github.com/Ductmonkey" target="_blank" rel="noopener">@Ductmonkey</a>, <a href="https://github.com/edent" target="_blank" rel="noopener">@edent</a>, <a href="https://github.com/Elham-adabi" target="_blank" rel="noopener">@Elham-adabi</a>, <a href="https://github.com/ELICHOS" target="_blank" rel="noopener">@ELICHOS</a>, <a href="https://github.com/eliocamp" target="_blank" rel="noopener">@eliocamp</a>, <a href="https://github.com/ellisp" target="_blank" rel="noopener">@ellisp</a>, <a href="https://github.com/emuise" target="_blank" rel="noopener">@emuise</a>, <a href="https://github.com/erikdeluca" target="_blank" rel="noopener">@erikdeluca</a>, <a href="https://github.com/f2il-kieranmace" target="_blank" rel="noopener">@f2il-kieranmace</a>, <a href="https://github.com/FDylanT" target="_blank" rel="noopener">@FDylanT</a>, <a href="https://github.com/fkohrt" target="_blank" rel="noopener">@fkohrt</a>, <a href="https://github.com/francisbarton" target="_blank" rel="noopener">@francisbarton</a>, <a href="https://github.com/fredcallaway" target="_blank" rel="noopener">@fredcallaway</a>, <a href="https://github.com/frezza-metabolomics" target="_blank" rel="noopener">@frezza-metabolomics</a>, <a href="https://github.com/GabrielHoffman" target="_blank" rel="noopener">@GabrielHoffman</a>, <a href="https://github.com/gaospecial" target="_blank" rel="noopener">@gaospecial</a>, <a href="https://github.com/garyzhubc" target="_blank" rel="noopener">@garyzhubc</a>, <a href="https://github.com/gavinsimpson" target="_blank" rel="noopener">@gavinsimpson</a>, <a href="https://github.com/Generalized" target="_blank" rel="noopener">@Generalized</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/giadasp" target="_blank" rel="noopener">@giadasp</a>, <a href="https://github.com/GMSL1" target="_blank" rel="noopener">@GMSL1</a>, <a href="https://github.com/grantmcdermott" target="_blank" rel="noopener">@grantmcdermott</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hlynurhallgrims" target="_blank" rel="noopener">@hlynurhallgrims</a>, <a href="https://github.com/holgerbrandl" target="_blank" rel="noopener">@holgerbrandl</a>, <a href="https://github.com/hpages" target="_blank" rel="noopener">@hpages</a>, <a href="https://github.com/HRodenhizer" target="_blank" rel="noopener">@HRodenhizer</a>, <a href="https://github.com/hub-shale" target="_blank" rel="noopener">@hub-shale</a>, <a href="https://github.com/hughjonesd" target="_blank" rel="noopener">@hughjonesd</a>, <a href="https://github.com/ibuiltthis" target="_blank" rel="noopener">@ibuiltthis</a>, <a href="https://github.com/ingewortel" target="_blank" rel="noopener">@ingewortel</a>, <a href="https://github.com/isaacvock" target="_blank" rel="noopener">@isaacvock</a>, <a href="https://github.com/Istalan" target="_blank" rel="noopener">@Istalan</a>, <a href="https://github.com/istvankleijn" target="_blank" rel="noopener">@istvankleijn</a>, <a href="https://github.com/jacobkasper" target="_blank" rel="noopener">@jacobkasper</a>, <a href="https://github.com/jammainen" target="_blank" rel="noopener">@jammainen</a>, <a href="https://github.com/jan-glx" target="_blank" rel="noopener">@jan-glx</a>, <a href="https://github.com/JaredAllen2" target="_blank" rel="noopener">@JaredAllen2</a>, <a href="https://github.com/jashapiro" target="_blank" rel="noopener">@jashapiro</a>, <a href="https://github.com/jimjam-slam" target="_blank" rel="noopener">@jimjam-slam</a>, <a href="https://github.com/jmuhlenkamp" target="_blank" rel="noopener">@jmuhlenkamp</a>, <a href="https://github.com/jonspring" target="_blank" rel="noopener">@jonspring</a>, <a href="https://github.com/JorisChau" target="_blank" rel="noopener">@JorisChau</a>, <a href="https://github.com/joshhwuu" target="_blank" rel="noopener">@joshhwuu</a>, <a href="https://github.com/jpeasari" target="_blank" rel="noopener">@jpeasari</a>, <a href="https://github.com/jromanowska" target="_blank" rel="noopener">@jromanowska</a>, <a href="https://github.com/jsacerot" target="_blank" rel="noopener">@jsacerot</a>, <a href="https://github.com/jtlandis" target="_blank" rel="noopener">@jtlandis</a>, <a href="https://github.com/jtr13" target="_blank" rel="noopener">@jtr13</a>, <a href="https://github.com/jttoivon" target="_blank" rel="noopener">@jttoivon</a>, <a href="https://github.com/karchern" target="_blank" rel="noopener">@karchern</a>, <a href="https://github.com/klin333" target="_blank" rel="noopener">@klin333</a>, <a href="https://github.com/kmavrommatis" target="_blank" rel="noopener">@kmavrommatis</a>, <a href="https://github.com/kramerrs" target="_blank" rel="noopener">@kramerrs</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/kylebutts" target="_blank" rel="noopener">@kylebutts</a>, <a href="https://github.com/larmarange" target="_blank" rel="noopener">@larmarange</a>, <a href="https://github.com/latot" target="_blank" rel="noopener">@latot</a>, <a href="https://github.com/lhami" target="_blank" rel="noopener">@lhami</a>, <a href="https://github.com/liang09255" target="_blank" rel="noopener">@liang09255</a>, <a href="https://github.com/linzi-sg" target="_blank" rel="noopener">@linzi-sg</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lnarwhale" target="_blank" rel="noopener">@lnarwhale</a>, <a href="https://github.com/manjumc1975" target="_blank" rel="noopener">@manjumc1975</a>, <a href="https://github.com/mariadelmarq" target="_blank" rel="noopener">@mariadelmarq</a>, <a href="https://github.com/matanhakim" target="_blank" rel="noopener">@matanhakim</a>, <a href="https://github.com/math-mcshane" target="_blank" rel="noopener">@math-mcshane</a>, <a href="https://github.com/mattgalbraith" target="_blank" rel="noopener">@mattgalbraith</a>, <a href="https://github.com/matthewjnield" target="_blank" rel="noopener">@matthewjnield</a>, <a href="https://github.com/mcwayrm" target="_blank" rel="noopener">@mcwayrm</a>, <a href="https://github.com/melissagwolf" target="_blank" rel="noopener">@melissagwolf</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/MikkoVihtakari" target="_blank" rel="noopener">@MikkoVihtakari</a>, <a href="https://github.com/MjelleLab" target="_blank" rel="noopener">@MjelleLab</a>, <a href="https://github.com/mjskay" target="_blank" rel="noopener">@mjskay</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/mmokrejs" target="_blank" rel="noopener">@mmokrejs</a>, <a href="https://github.com/modmost" target="_blank" rel="noopener">@modmost</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/morrisseyj" target="_blank" rel="noopener">@morrisseyj</a>, <a href="https://github.com/mps9506" target="_blank" rel="noopener">@mps9506</a>, <a href="https://github.com/Nh-code" target="_blank" rel="noopener">@Nh-code</a>, <a href="https://github.com/njtierney" target="_blank" rel="noopener">@njtierney</a>, <a href="https://github.com/oliviercailloux" target="_blank" rel="noopener">@oliviercailloux</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/otaviolovison" target="_blank" rel="noopener">@otaviolovison</a>, <a href="https://github.com/pablobernabeu" target="_blank" rel="noopener">@pablobernabeu</a>, <a href="https://github.com/paulatn240" target="_blank" rel="noopener">@paulatn240</a>, <a href="https://github.com/phauchamps" target="_blank" rel="noopener">@phauchamps</a>, <a href="https://github.com/quantixed" target="_blank" rel="noopener">@quantixed</a>, <a href="https://github.com/ralmond" target="_blank" rel="noopener">@ralmond</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/reallzg" target="_blank" rel="noopener">@reallzg</a>, <a href="https://github.com/retodomax" target="_blank" rel="noopener">@retodomax</a>, <a href="https://github.com/robbiebatley" target="_blank" rel="noopener">@robbiebatley</a>, <a href="https://github.com/Rong-Zh" target="_blank" rel="noopener">@Rong-Zh</a>, <a href="https://github.com/rossellhayes" target="_blank" rel="noopener">@rossellhayes</a>, <a href="https://github.com/RoyalTS" target="_blank" rel="noopener">@RoyalTS</a>, <a href="https://github.com/rvalieris" target="_blank" rel="noopener">@rvalieris</a>, <a href="https://github.com/s-andrews" target="_blank" rel="noopener">@s-andrews</a>, <a href="https://github.com/s-elsheikh" target="_blank" rel="noopener">@s-elsheikh</a>, <a href="https://github.com/schloerke" target="_blank" rel="noopener">@schloerke</a>, <a href="https://github.com/Sckende" target="_blank" rel="noopener">@Sckende</a>, <a href="https://github.com/sdmason" target="_blank" rel="noopener">@sdmason</a>, <a href="https://github.com/sirallen" target="_blank" rel="noopener">@sirallen</a>, <a href="https://github.com/slowkow" target="_blank" rel="noopener">@slowkow</a>, <a href="https://github.com/spaette" target="_blank" rel="noopener">@spaette</a>, <a href="https://github.com/steveharoz" target="_blank" rel="noopener">@steveharoz</a>, <a href="https://github.com/sunroofgod" target="_blank" rel="noopener">@sunroofgod</a>, <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>, <a href="https://github.com/tbates" target="_blank" rel="noopener">@tbates</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, <a href="https://github.com/tfjaeger" target="_blank" rel="noopener">@tfjaeger</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/TimBMK" target="_blank" rel="noopener">@TimBMK</a>, <a href="https://github.com/TimTaylor" target="_blank" rel="noopener">@TimTaylor</a>, <a href="https://github.com/tjebo" target="_blank" rel="noopener">@tjebo</a>, <a href="https://github.com/trekonom" target="_blank" rel="noopener">@trekonom</a>, <a href="https://github.com/tungttnguyen" target="_blank" rel="noopener">@tungttnguyen</a>, <a href="https://github.com/twest820" target="_blank" rel="noopener">@twest820</a>, <a href="https://github.com/UliSchopp" target="_blank" rel="noopener">@UliSchopp</a>, <a href="https://github.com/vnijs" target="_blank" rel="noopener">@vnijs</a>, <a href="https://github.com/warnes" target="_blank" rel="noopener">@warnes</a>, <a href="https://github.com/wbvguo" target="_blank" rel="noopener">@wbvguo</a>, <a href="https://github.com/willgearty" target="_blank" rel="noopener">@willgearty</a>, <a href="https://github.com/Yann-C-INN" target="_blank" rel="noopener">@Yann-C-INN</a>, <a href="https://github.com/yannk-lm" target="_blank" rel="noopener">@yannk-lm</a>, <a href="https://github.com/Yunuuuu" target="_blank" rel="noopener">@Yunuuuu</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, <a href="https://github.com/yuw444" target="_blank" rel="noopener">@yuw444</a>, <a href="https://github.com/zekiakyol" target="_blank" rel="noopener">@zekiakyol</a>, and <a href="https://github.com/zhenglukai" target="_blank" rel="noopener">@zhenglukai</a>.</p> bigrquery 1.5.0 https://www.tidyverse.org/blog/2024/01/bigrquery-1-5-0/ Mon, 22 Jan 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/01/bigrquery-1-5-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re stoked to announce the release of <a href="http://bigrquery.r-dbi.org/" target="_blank" rel="noopener">bigrquery</a> 1.5.0. bigrquery makes it easy to work with data stored in <a href="https://developers.google.com/bigquery/" target="_blank" rel="noopener">Google BigQuery</a>, a hosted database for big data.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"bigrquery"</span><span class='o'>)</span></span></code></pre> </div> <p>This has been the first major update to bigrquery for a while, and is mostly about catching up with innovations elsewhere as well as squashing a bunch of smaller annoyances.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://bigrquery.r-dbi.org'>bigrquery</a></span><span class='o'>)</span></span></code></pre> </div> <p>Here&rsquo;s a summary of the biggest changes:</p> <ul> <li> <p>bigrquery is now <a href="https://www.tidyverse.org/blog/2021/12/relicensing-packages/" target="_blank" rel="noopener">MIT licensed</a>.</p> </li> <li> <p>Deprecated functions (i.e. those not starting with <code>bq_</code>) have been removed. These have been superseded for a long time and were formally deprecated in bigrquery 1.3.0 (2020).</p> </li> <li> <p> <a href="https://bigrquery.r-dbi.org/reference/bq_table_download.html" target="_blank" rel="noopener"><code>bq_table_download()</code></a> now returns unknown fields as character vectors. In particular, this means that <code>BIGNUMERIC</code> and <code>JSON</code> columns are downloaded into R for you to process as you wish. <a href="https://bigrquery.r-dbi.org/reference/bq_table_download.html" target="_blank" rel="noopener"><code>bq_table_download()</code></a> now uses the <a href="https://clock.r-lib.org" target="_blank" rel="noopener">clock package</a> to parse dates, leading to a considerable performance improvement and correct parsing for dates prior to 1970-01-01.</p> </li> <li> <p>bigquery datasets and tables will now appear in the <a href="https://docs.posit.co/ide/user/ide/guide/data/data-connections.html" target="_blank" rel="noopener">RStudio connections pane</a> when connecting with <a href="https://dbi.r-dbi.org/reference/dbConnect.html" target="_blank" rel="noopener"><code>DBI::dbConnect()</code></a>.</p> </li> <li> <p><code>DBI::dbAppendTable(),</code> <a href="https://dbi.r-dbi.org/reference/dbCreateTable.html" target="_blank" rel="noopener"><code>DBI::dbCreateTable()</code></a>, and <a href="https://dbi.r-dbi.org/reference/dbExecute.html" target="_blank" rel="noopener"><code>DBI::dbExecute()</code></a> are now supported, and <a href="https://dbi.r-dbi.org/reference/dbGetQuery.html" target="_blank" rel="noopener"><code>DBI::dbGetQuery()</code></a>/ <a href="https://dbi.r-dbi.org/reference/dbSendQuery.html" target="_blank" rel="noopener"><code>DBI::dbSendQuery()</code></a> support parameterised queries via the <code>params</code> argument. <a href="https://dbi.r-dbi.org/reference/dbReadTable.html" target="_blank" rel="noopener"><code>DBI::dbReadTable()</code></a>, <a href="https://dbi.r-dbi.org/reference/dbWriteTable.html" target="_blank" rel="noopener"><code>DBI::dbWriteTable()</code></a>, <a href="https://dbi.r-dbi.org/reference/dbExistsTable.html" target="_blank" rel="noopener"><code>DBI::dbExistsTable()</code></a>, <a href="https://dbi.r-dbi.org/reference/dbRemoveTable.html" target="_blank" rel="noopener"><code>DBI::dbRemoveTable()</code></a>, and <a href="https://dbi.r-dbi.org/reference/dbListFields.html" target="_blank" rel="noopener"><code>DBI::dbListFields()</code></a> now all work with <a href="https://dbi.r-dbi.org/reference/Id.html" target="_blank" rel="noopener"><code>DBI::Id()</code></a>.</p> </li> <li> <p>bigrquery now uses 2nd edition of dbplyr interface and is compatible with dbplyr 2.4.0.</p> </li> </ul> <p>See the <a href="https://github.com/r-dbi/bigrquery/releases/tag/v1.5.0" target="_blank" rel="noopener">release notes</a> for a full list of changes.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 14 folks who helped make this release happen with questions, comments, and code: <a href="https://github.com/abalter" target="_blank" rel="noopener">@abalter</a>, <a href="https://github.com/ablack3" target="_blank" rel="noopener">@ablack3</a>, <a href="https://github.com/evanrollinsdrumline" target="_blank" rel="noopener">@evanrollinsdrumline</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/husseyd" target="_blank" rel="noopener">@husseyd</a>, <a href="https://github.com/jacobmpeters" target="_blank" rel="noopener">@jacobmpeters</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/Kvit" target="_blank" rel="noopener">@Kvit</a>, <a href="https://github.com/meztez" target="_blank" rel="noopener">@meztez</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mjbroerman" target="_blank" rel="noopener">@mjbroerman</a>, <a href="https://github.com/ncuriale" target="_blank" rel="noopener">@ncuriale</a>, and <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>.</p> withr 3.0.0 https://www.tidyverse.org/blog/2024/01/withr-3-0-0/ Thu, 18 Jan 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/01/withr-3-0-0/ <p>It&rsquo;s not without jubilant bearing that we announce the release of the 3.0.0 version of <a href="https://withr.r-lib.org/" target="_blank" rel="noopener">withr</a>, the tidyverse solution for automatic cleanup of resources! In this release, the internals of withr were rewritten to improve the performance and increase the compatibility with base R&rsquo;s <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> mechanism.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"withr"</span><span class='o'>)</span></span></code></pre> </div> <p>In this blog post we&rsquo;ll go over the changes that made this rewrite possible, but first we&rsquo;ll review the cleanup strategies made possible by withr.</p> <p>You can see a full list of changes in the <a href="https://withr.r-lib.org/news/index.html#withr-300" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> </div> <h2 id="cleaning-up-resources-with-base-r-and-with-withr">Cleaning up resources with base R and with withr <a href="#cleaning-up-resources-with-base-r-and-with-withr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Traditionally, resource cleanup in R is done with <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>base::on.exit()</code></a>. Cleaning up in the on-exit hook ensures that the cleanup happens both in the normal case, when the code has finished running without error, and in the error case, when something went wrong and execution is interrupted.</p> <p> <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> is meant to be used inside functions but it also works within <a href="https://rdrr.io/r/base/eval.html" target="_blank" rel="noopener"><code>local()</code></a>, which we&rsquo;ll use here for our examples:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/message.html'>message</a></span><span class='o'>(</span><span class='s'>"Cleaning time!"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span> <span class='o'>+</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 3</span></span> <span></span><span><span class='c'>#&gt; Cleaning time!</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/message.html'>message</a></span><span class='o'>(</span><span class='s'>"Cleaning time!"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"uh oh"</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span> <span class='o'>+</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> uh oh</span></span> <span></span><span><span class='c'>#&gt; Cleaning time!</span></span> <span></span></code></pre> </div> <p> <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> is guaranteed to run no matter what and this property makes it invaluable for resource cleaning. No more accidental littering!</p> <p>However the process of cleaning up this way can be a bit verbose and feel too manual. Here is how you&rsquo;d create and clean up a temporary file for instance:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='nv'>my_file</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/tempfile.html'>tempfile</a></span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/files.html'>file.create</a></span><span class='o'>(</span><span class='nv'>my_file</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/files.html'>file.remove</a></span><span class='o'>(</span><span class='nv'>my_file</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/writeLines.html'>writeLines</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span>, con <span class='o'>=</span> <span class='nv'>my_file</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>Wouldn&rsquo;t it be great if we could wrap this code up in a function? That&rsquo;s the goal of withr&rsquo;s <code>local_</code>-prefixed functions. They combine both the creation or modification of a resource and its (eventual) restoration to the original state into a single function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='nv'>my_file</span> <span class='o'>&lt;-</span> <span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/with_tempfile.html'>local_tempfile</a></span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/writeLines.html'>writeLines</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span>, con <span class='o'>=</span> <span class='nv'>my_file</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>In this case we have created a resource (a file), but the same principle applies to modifying resources such as global options:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='c'># Let's temporarily print with a single decimal place</span></span> <span> <span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/with_options.html'>local_options</a></span><span class='o'>(</span>digits <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>/</span><span class='m'>3</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.3</span></span> <span></span><span></span> <span><span class='c'># The original option value has been restored</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/options.html'>getOption</a></span><span class='o'>(</span><span class='s'>"digits"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 7</span></span> <span></span><span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>/</span><span class='m'>3</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.3333333</span></span> <span></span></code></pre> </div> <p>And you can equivalently use the <code>with_</code>-prefixed variants (from which the package takes its name!), this way you don&rsquo;t need to wrap in <a href="https://rdrr.io/r/base/eval.html" target="_blank" rel="noopener"><code>local()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/with_options.html'>with_options</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>digits <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>/</span><span class='m'>3</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.3</span></span> <span></span></code></pre> </div> <p>The <code>with_</code> functions are useful for creating very small scopes for given resources, inside or outside a function.</p> <h2 id="the-withr-300-rewrite">The withr 3.0.0 rewrite <a href="#the-withr-300-rewrite"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Traditionally, withr implemented its own exit event system on top of <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a>. We needed an extra layer because of a couple of missing features:</p> <ul> <li> <p>When multiple resources are managed by a piece of code, the order in which these resources are restored or cleaned up sometimes matter. The most consistent order for cleanup is last-in first-out (LIFO). In other words the oldest resource, on which younger resources might depend, is cleaned up last. But historically R only supported first-in first-out (FIFO) order.</p> </li> <li> <p>The other missing piece was being able to inspect the contents of the exit hook. The <a href="https://rdrr.io/r/base/sys.parent.html" target="_blank" rel="noopener"><code>sys.on.exit()</code></a> R helper was created for this purpose but was affected by a bug that prevented it from working inside functions.</p> </li> </ul> <p>We contributed two changes to R 3.5.0 that filled these missing pieces, fixing the <a href="https://rdrr.io/r/base/sys.parent.html" target="_blank" rel="noopener"><code>sys.on.exit()</code></a> bug and adding an <code>after</code> argument to <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> to allow last-in first-out ordering.</p> <p>Until now, we haven&rsquo;t been able to leverage these contributions because of our policy of <a href="https://www.tidyverse.org/blog/2019/04/r-version-support" target="_blank" rel="noopener">supporting the current and previous four versions of R</a>. Now that enough time has passed, it was time for a rewrite! Our version of <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>base::on.exit()</code></a> is <a href="https://withr.r-lib.org/reference/defer.html" target="_blank" rel="noopener"><code>withr::defer()</code></a>. Along with better default behaviour, <a href="https://withr.r-lib.org/reference/defer.html" target="_blank" rel="noopener"><code>withr::defer()</code></a> allows the clean up of resources non-locally (ironically an essential feature for implementing <code>local_</code> functions). Given the changes in R 3.5.0, <a href="https://withr.r-lib.org/reference/defer.html" target="_blank" rel="noopener"><code>withr::defer()</code></a> can now be implemented as a simple wrapper around <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a>.</p> <p>One benefit of the rewrite is that mixing withr tools and <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> in the same function now correctly interleaves cleanup:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/eval.html'>local</a></span><span class='o'>(</span><span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/defer.html'>defer</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>3</span><span class='o'>)</span>, add <span class='o'>=</span> <span class='kc'>TRUE</span>, after <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/defer.html'>defer</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>4</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='m'>5</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 5</span></span> <span><span class='c'>#&gt; [1] 4</span></span> <span><span class='c'>#&gt; [1] 3</span></span> <span><span class='c'>#&gt; [1] 2</span></span> <span><span class='c'>#&gt; [1] 1</span></span> <span></span></code></pre> </div> <p>But the main benefit is increased performance. Here is how <code>defer()</code> compared to <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> in the previous version:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>base</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/on.exit.html'>on.exit</a></span><span class='o'>(</span><span class='kc'>NULL</span><span class='o'>)</span></span> <span><span class='nv'>withr</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='nf'>defer</span><span class='o'>(</span><span class='kc'>NULL</span><span class='o'>)</span></span> <span></span> <span><span class='c'># withr 2.5.2</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'>base</span><span class='o'>(</span><span class='o'>)</span>, <span class='nf'>withr</span><span class='o'>(</span><span class='o'>)</span>, check <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>8</span><span class='o'>]</span></span> <span><span class='c'>#&gt; # A tibble: 2 × 8</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 base() 0 82ns 6954952. 0B 696. 9999 1</span></span> <span><span class='c'>#&gt; 2 withr() 26.2µs 27.9µs 35172. 88.4KB 52.8 9985 15</span></span></code></pre> </div> <p>withr 3.0.0 has now caught up to <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> quite a bit:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># withr 3.0.0</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'>base</span><span class='o'>(</span><span class='o'>)</span>, <span class='nf'>withr</span><span class='o'>(</span><span class='o'>)</span>, check <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>8</span><span class='o'>]</span></span> <span><span class='c'>#&gt; # A tibble: 2 × 8</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 base() 0 82ns 7329829. 0B 0 10000 0</span></span> <span><span class='c'>#&gt; 2 withr() 2.95µs 3.4µs 280858. 0B 225. 9992 8</span></span></code></pre> </div> <p>Of course <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> is still much faster, in part because <code>defer()</code> supports more features (more on that below), but mostly because <code>on.exit</code> is a primitive function whereas <code>defer()</code> is implemented as a normal R function. That said, we hope that we now have made <code>defer()</code> (and the <code>local_</code> and <code>with_</code> functions that use it) sufficiently fast to be used even in performance-critical micro-tools.</p> <h2 id="improved-withr-features">Improved withr features <a href="#improved-withr-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Over the successive releases of withr we&rsquo;ve improved the behaviour of cleanup expressions interactively, in scripts executed with <a href="https://rdrr.io/r/base/source.html" target="_blank" rel="noopener"><code>source()</code></a>, and in knitr. <a href="https://rdrr.io/r/base/on.exit.html" target="_blank" rel="noopener"><code>on.exit()</code></a> is a bit inconsistent when it is used outside of a function:</p> <ul> <li>Interactively, it doesn&rsquo;t do anything.</li> <li>In <a href="https://rdrr.io/r/base/source.html" target="_blank" rel="noopener"><code>source()</code></a> and in knitr, it runs immediately instead of a the end of the script</li> </ul> <p> <a href="https://withr.r-lib.org/reference/defer.html" target="_blank" rel="noopener"><code>withr::defer()</code></a> and the <a href="https://withr.r-lib.org/reference/with_.html" target="_blank" rel="noopener"><code>withr::local_</code></a> helpers try to be more helpful for these cases.</p> <p>Interactively, it saves the cleanup action in a special global hook and you get information about how to actually perform the cleanup:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>file</span> <span class='o'>&lt;-</span> <span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/with_tempfile.html'>local_tempfile</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Setting global deferred event(s).</span></span> <span><span class='c'>#&gt; i These will be run:</span></span> <span><span class='c'>#&gt; * Automatically, when the R session ends.</span></span> <span><span class='c'>#&gt; * On demand, if you call `withr::deferred_run()`.</span></span> <span><span class='c'>#&gt; i Use `withr::deferred_clear()` to clear them without executing.</span></span> <span></span> <span><span class='c'># Clean up now</span></span> <span><span class='nf'>withr</span><span class='nf'>::</span><span class='nf'><a href='https://withr.r-lib.org/reference/defer.html'>deferred_run</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Ran 1/1 deferred expressions</span></span></code></pre> </div> <p>In knitr or <a href="https://rdrr.io/r/base/source.html" target="_blank" rel="noopener"><code>source()</code></a><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, the cleanup is performed at the end of the document or of the script. If you need chunk-level cleanup, use <a href="https://rdrr.io/r/base/eval.html" target="_blank" rel="noopener"><code>local()</code></a> as we&rsquo;ve been doing in the examples of this blog post:</p> <div class="highlight"><pre class="chroma"><code class="language-md" data-lang="md">Cleaning up at the end of the document: <span class="s">```r </span><span class="s"></span><span class="n">document_wide_file</span> <span class="o">&lt;-</span> <span class="n">withr</span><span class="o">::</span><span class="nf">local_tempfile</span><span class="p">()</span> <span class="s">```</span> Cleaning up at the end of the chunk: <span class="s">```r </span><span class="s"></span><span class="nf">local</span><span class="p">({</span> <span class="n">local_file</span> <span class="o">&lt;-</span> <span class="n">withr</span><span class="o">::</span><span class="nf">local_tempfile</span><span class="p">()</span> <span class="p">})</span> <span class="s">```</span> </code></pre></div><p>Starting from withr 3.0.0, you can also run <code>deferred_run()</code> inside of a chunk:</p> <div class="highlight"><pre class="chroma"><code class="language-md" data-lang="md"><span class="s">```r </span><span class="s"></span><span class="n">withr</span><span class="o">::</span><span class="nf">deferred_run</span><span class="p">()</span> <span class="c1">#&gt; Ran 1/1 deferred expressions</span> <span class="s">```</span> </code></pre></div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to the github contributors who helped us with this release!</p> <p> <a href="https://github.com/ashbythorpe" target="_blank" rel="noopener">@ashbythorpe</a>, <a href="https://github.com/bastistician" target="_blank" rel="noopener">@bastistician</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/fkohrt" target="_blank" rel="noopener">@fkohrt</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gdurif" target="_blank" rel="noopener">@gdurif</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/HenrikBengtsson" target="_blank" rel="noopener">@HenrikBengtsson</a>, <a href="https://github.com/honghaoli42" target="_blank" rel="noopener">@honghaoli42</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jonkeane" target="_blank" rel="noopener">@jonkeane</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/MLopez-Ibanez" target="_blank" rel="noopener">@MLopez-Ibanez</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/orichters" target="_blank" rel="noopener">@orichters</a>, <a href="https://github.com/pfuehrlich-pik" target="_blank" rel="noopener">@pfuehrlich-pik</a>, <a href="https://github.com/solmos" target="_blank" rel="noopener">@solmos</a>, <a href="https://github.com/tillea" target="_blank" rel="noopener">@tillea</a>, and <a href="https://github.com/vanhry" target="_blank" rel="noopener">@vanhry</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p> <a href="https://rdrr.io/r/base/source.html" target="_blank" rel="noopener"><code>source()</code></a> is only supported by default when running in the global environment, which is usually the case. For the special case of sourcing in a local environment, you need to set <code>options(withr.hook_source = TRUE)</code> first. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> roxygen2 7.3.0 https://www.tidyverse.org/blog/2024/01/roxygen2-7-3-0/ Thu, 11 Jan 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/01/roxygen2-7-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re well pleased to announce the release of <a href="http://roxygen2.r-lib.org/" target="_blank" rel="noopener">roxygen2</a> 7.3.0. roxygen2 allows you to write specially formatted R comments that generate R documentation files (<code>man/*.Rd</code>) and the <code>NAMESPACE</code> file. roxygen2 is used by over 13,000 CRAN packages.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"roxygen2"</span><span class='o'>)</span></span></code></pre> </div> <p>There are four major improvements in this release:</p> <ul> <li> <p>The <code>NAMESPACE</code> roclet now reports if you have S3 methods that are missing an <code>@export</code> tag. All S3 methods need to be <code>@export</code>ed even if the generic is not. This avoids rare, but hard to debug, problems. If you think this is giving a false positive, <a href="https://github.com/r-lib/roxygen2/issues/new" target="_blank" rel="noopener">please file an issue</a> and suppress the warning with <code>@exportS3Method NULL</code>.</p> <p>I&rsquo;ve also considerably revamped the documentation for S3 methods in <a href="https://roxygen2.r-lib.org/dev/articles/namespace.html#s3" target="_blank" rel="noopener"><code>vignette(&quot;namespace&quot;)</code></a>. The docs now discuss what exporting an S3 method really means, and why it would be technically better to call it <em>registering</em> the method.</p> </li> <li> <p>Finally, the <code>NAMESPACE</code> roclet once again regenerates imports <em>before</em> loading package code and parsing roxygen blocks. This has been the goal for a <a href="https://github.com/r-lib/roxygen2/issues/372" target="_blank" rel="noopener">long time</a>, but we accidentally broke it when adding support for code execution in markdown blocks. This change resolves a family of problems where you somehow bork your <code>NAMESPACE</code> and can&rsquo;t easily get fix it because you can&rsquo;t re-document the package because you can&rsquo;t load your package because your <code>NAMESPACE</code> is borked.</p> </li> <li> <p><code>@docType package</code> now works like <a href="https://roxygen2.r-lib.org/articles/rd-other.html#packages" target="_blank" rel="noopener"><code>&quot;_PACKAGE&quot;</code></a>, including creating a <code>{packagename}-package</code> alias automatically. This resolves a bug introduced in roxygen2 7.0.0 that meant that many packages lacked the correct alias for their package documentation topic.</p> </li> <li> <p><code>&quot;_PACKAGE&quot;</code> does a better job of automatically generating aliases. In particular, it will no longer generate a duplicate alias if you have a function with the same name as your package (like <a href="https://glue.tidyverse.org/reference/glue.html" target="_blank" rel="noopener"><code>glue::glue()</code></a> or <a href="https://reprex.tidyverse.org/reference/reprex.html" target="_blank" rel="noopener"><code>reprex::reprex()</code></a>). If you&rsquo;ve previously had to hack around this bug, you can now delete any custom <code>@aliases</code> tags associated with the <code>&quot;_PACKAGE&quot;</code> docs.</p> </li> </ul> <p>You can see a full list of other minor improvements and bug fixes in the <a href="https://github.com/r-lib/roxygen2/releases/tag/v7.3.0" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to the 46 folks who helped make this release possible through their thoughtful questions and carefully crafted code! <a href="https://github.com/andrewmarx" target="_blank" rel="noopener">@andrewmarx</a>, <a href="https://github.com/ashbythorpe" target="_blank" rel="noopener">@ashbythorpe</a>, <a href="https://github.com/ateucher" target="_blank" rel="noopener">@ateucher</a>, <a href="https://github.com/bahadzie" target="_blank" rel="noopener">@bahadzie</a>, <a href="https://github.com/bastistician" target="_blank" rel="noopener">@bastistician</a>, <a href="https://github.com/beginb" target="_blank" rel="noopener">@beginb</a>, <a href="https://github.com/brodieG" target="_blank" rel="noopener">@brodieG</a>, <a href="https://github.com/bryanhanson" target="_blank" rel="noopener">@bryanhanson</a>, <a href="https://github.com/cbielow" target="_blank" rel="noopener">@cbielow</a>, <a href="https://github.com/daattali" target="_blank" rel="noopener">@daattali</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/dsweber2" target="_blank" rel="noopener">@dsweber2</a>, <a href="https://github.com/espinielli" target="_blank" rel="noopener">@espinielli</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hughjonesd" target="_blank" rel="noopener">@hughjonesd</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/johnbaums" target="_blank" rel="noopener">@johnbaums</a>, <a href="https://github.com/jonocarroll" target="_blank" rel="noopener">@jonocarroll</a>, <a href="https://github.com/kathi-munk" target="_blank" rel="noopener">@kathi-munk</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/kylebutts" target="_blank" rel="noopener">@kylebutts</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/LouisLeNezet" target="_blank" rel="noopener">@LouisLeNezet</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/MaximilianPi" target="_blank" rel="noopener">@MaximilianPi</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/msberends" target="_blank" rel="noopener">@msberends</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/musvaage" target="_blank" rel="noopener">@musvaage</a>, <a href="https://github.com/neshvig10" target="_blank" rel="noopener">@neshvig10</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/ralmond" target="_blank" rel="noopener">@ralmond</a>, <a href="https://github.com/RMHogervorst" target="_blank" rel="noopener">@RMHogervorst</a>, <a href="https://github.com/Robinlovelace" target="_blank" rel="noopener">@Robinlovelace</a>, <a href="https://github.com/rossellhayes" target="_blank" rel="noopener">@rossellhayes</a>, <a href="https://github.com/rsbivand" target="_blank" rel="noopener">@rsbivand</a>, <a href="https://github.com/sbgraves237" target="_blank" rel="noopener">@sbgraves237</a>, <a href="https://github.com/schradj" target="_blank" rel="noopener">@schradj</a>, <a href="https://github.com/sebffischer" target="_blank" rel="noopener">@sebffischer</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/stemangiola" target="_blank" rel="noopener">@stemangiola</a>, <a href="https://github.com/tau31" target="_blank" rel="noopener">@tau31</a>, and <a href="https://github.com/trusch139" target="_blank" rel="noopener">@trusch139</a>.</p> Q4 2023 tidymodels digest https://www.tidyverse.org/blog/2024/01/tidymodels-2023-q4/ Tue, 09 Jan 2024 00:00:00 +0000 https://www.tidyverse.org/blog/2024/01/tidymodels-2023-q4/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like this post from the past couple of months:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2023/11/tidymodels-errors-q4/" target="_blank" rel="noopener">Three ways errors are about to get better in tidymodels</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 7 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.3)</a></li> <li>modeldb <a href="https://modeldb.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.9)</a></li> <li>spatialsample <a href="https://spatialsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.5.1)</a></li> <li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a></li> <li>textrecipes <a href="https://textrecipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.6)</a></li> <li>tidyposterior <a href="https://tidyposterior.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: updated warnings when normalizing, and better error messages in recipes.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"ames"</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="updated-warnings-when-normalizing">Updated warnings when normalizing <a href="#updated-warnings-when-normalizing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The latest release of recipes features an overhaul of the warnings and error messages to use the <a href="https://cli.r-lib.org/" target="_blank" rel="noopener">cli</a> package. With this, we are starting the project of providing more information signaling when things don&rsquo;t go well.</p> <p>The first type of issue we now signal for is when you try to normalize data that contains elements such as <code>NA</code> or <code>Inf</code>. These can sneak in for several reasons, and before this release, it happened silently. Below we are creating a recipe using the <code>ames</code> data set, and before we normalize, we are taking the logarithms of all variables that pertain to square footage.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Columns `BsmtFin_SF_1`, `BsmtFin_SF_2`, `Bsmt_Unf_SF`, `Total_Bsmt_SF`,</span></span> <span><span class='c'>#&gt; `Second_Flr_SF`, `Wood_Deck_SF`, and `Open_Porch_SF` returned NaN, because</span></span> <span><span class='c'>#&gt; variance cannot be calculated and scaling cannot be used. Consider avoiding</span></span> <span><span class='c'>#&gt; `Inf` or `-Inf` values and/or setting `na_rm = TRUE` before normalizing.</span></span> <span></span></code></pre> </div> <p>We now get a warning that something happened, telling us that it encountered <code>Inf</code> or <code>-Inf</code>. Knowing that, we can go back and investigate what went wrong. If we exclude <code>step_normalize()</code> and <code>bake()</code> the recipe, we see that a number of <code>-Inf</code> values appear.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>bake</span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='kc'>NULL</span>, <span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 2,930</span></span> <span><span class='c'>#&gt; Columns: 8</span></span> <span><span class='c'>#&gt; $ BsmtFin_SF_1 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.6931472, 1.7917595, 0.0000000, 0.0000000, 1.0986123, 1…</span></span> <span><span class='c'>#&gt; $ BsmtFin_SF_2 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> -Inf, 4.969813, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf…</span></span> <span><span class='c'>#&gt; $ Bsmt_Unf_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6.089045, 5.598422, 6.006353, 6.951772, 4.919981, 5.7807…</span></span> <span><span class='c'>#&gt; $ Total_Bsmt_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6.984716, 6.782192, 7.192182, 7.654443, 6.833032, 6.8308…</span></span> <span><span class='c'>#&gt; $ First_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 7.412160, 6.797940, 7.192182, 7.654443, 6.833032, 6.8308…</span></span> <span><span class='c'>#&gt; $ Second_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> -Inf, -Inf, -Inf, -Inf, 6.552508, 6.519147, -Inf, -Inf, …</span></span> <span><span class='c'>#&gt; $ Wood_Deck_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 5.347108, 4.941642, 5.973810, -Inf, 5.356586, 5.886104, …</span></span> <span><span class='c'>#&gt; $ Open_Porch_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4.127134, -Inf, 3.583519, -Inf, 3.526361, 3.583519, -Inf…</span></span> <span></span></code></pre> </div> <p>Looking at the bare data set, we notice that the <code>-Inf</code> all appear where there are <code>0</code>, which makes sense since <code>log(0)</code> is undefined.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ames</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>select</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 2,930</span></span> <span><span class='c'>#&gt; Columns: 8</span></span> <span><span class='c'>#&gt; $ BsmtFin_SF_1 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 2, 6, 1, 1, 3, 3, 3, 1, 3, 7, 7, 1, 7, 3, 3, 1, 3, 3, 4,…</span></span> <span><span class='c'>#&gt; $ BsmtFin_SF_2 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0, 144, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1120, 0, 0, …</span></span> <span><span class='c'>#&gt; $ Bsmt_Unf_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 441, 270, 406, 1045, 137, 324, 722, 1017, 415, 994, 763,…</span></span> <span><span class='c'>#&gt; $ Total_Bsmt_SF <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1080, 882, 1329, 2110, 928, 926, 1338, 1280, 1595, 994, …</span></span> <span><span class='c'>#&gt; $ First_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 1656, 896, 1329, 2110, 928, 926, 1338, 1280, 1616, 1028,…</span></span> <span><span class='c'>#&gt; $ Second_Flr_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 0, 0, 0, 0, 701, 678, 0, 0, 0, 776, 892, 0, 676, 0, 0, 1…</span></span> <span><span class='c'>#&gt; $ Wood_Deck_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 210, 140, 393, 0, 212, 360, 0, 0, 237, 140, 157, 483, 0,…</span></span> <span><span class='c'>#&gt; $ Open_Porch_SF <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 62, 0, 36, 0, 34, 36, 0, 82, 152, 60, 84, 21, 75, 0, 54,…</span></span> <span></span></code></pre> </div> <p>Knowing that it was <code>0</code> that caused the problem, we can set an <code>offset</code> to avoid taking <code>log(0)</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_log</span><span class='o'>(</span><span class='nf'>contains</span><span class='o'>(</span><span class='s'>"SF"</span><span class='o'>)</span>, offset <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span></code></pre> </div> <p>These warnings appear in <code>step_scale()</code>, <code>step_normalize()</code>, <code>step_center()</code> or <code>step_range()</code>.</p> <h2 id="better-error-messages-in-recipes">Better error messages in recipes <a href="#better-error-messages-in-recipes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another problem that happens a lot when using recipes, is accidentally selecting variables that have the wrong types. Previously this caused the following error:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>starts_with</span><span class='o'>(</span><span class='s'>"Lot_"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error in `step_dummy()`:</span></span> <span><span class='c'>#&gt; Caused by error in `prep()`:</span></span> <span><span class='c'>#&gt; ! All columns selected for the step should be string, factor, or ordered.</span></span></code></pre> </div> <p>In the newest release, it will detail the offending variables and what was wrong with them.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>starts_with</span><span class='o'>(</span><span class='s'>"Lot_"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>bake</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_dummy()`:</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `prep()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> All columns selected for the step should be factor or ordered.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> 1 double variable found: `Lot_Frontage`</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> 1 integer variable found: `Lot_Area`</span></span> <span></span></code></pre> </div> <h2 id="coming-attractions">Coming Attractions <a href="#coming-attractions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In the next month or so we are planning a cascade of CRAN releases. There is a lot of new functionality coming your way, especially in the tune package.</p> <p>A number of our packages will (finally) be able to cohesively fit, evaluate, tune, and predict models for event times (a.k.a., <a href="https://en.wikipedia.org/wiki/Survival_analysis" target="_blank" rel="noopener">survival analysis</a>). If you don&rsquo;t do this type of work, you might not notice the new capabilities. However, if you do, tidymodels will be able to do a lot more for you.</p> <p>We&rsquo;ve also implemented a number of features related to model fairness. These tools allow tidymodels users to identify when machine learning models behave unfairly towards certain groups of people, and will also be included in the upcoming releases of tidymodels packages in Q1.</p> <p>We&rsquo;ll highlight a lot of these new capabilities in blog posts here as well as tutorials on <a href="https://www.tidymodels.org/" target="_blank" rel="noopener"><code>tidymodels.org</code></a>.</p> <p>So, there&rsquo;s a lot more coming! We are very excited to have these features officially available and to see what people can do with them.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>.</li> <li>modeldb: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>recipes: <a href="https://github.com/atusy" target="_blank" rel="noopener">@atusy</a>, <a href="https://github.com/bcadenato" target="_blank" rel="noopener">@bcadenato</a>, <a href="https://github.com/collinberke" target="_blank" rel="noopener">@collinberke</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gfronk" target="_blank" rel="noopener">@gfronk</a>, <a href="https://github.com/jkennel" target="_blank" rel="noopener">@jkennel</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/mastoffel" target="_blank" rel="noopener">@mastoffel</a>, <a href="https://github.com/matthewgson" target="_blank" rel="noopener">@matthewgson</a>, <a href="https://github.com/millermc38" target="_blank" rel="noopener">@millermc38</a>, <a href="https://github.com/ray-p144" target="_blank" rel="noopener">@ray-p144</a>, <a href="https://github.com/sebsfox" target="_blank" rel="noopener">@sebsfox</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>spatialsample: <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>.</li> <li>stacks: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>textrecipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jd4ds" target="_blank" rel="noopener">@jd4ds</a>, and <a href="https://github.com/masurp" target="_blank" rel="noopener">@masurp</a>.</li> <li>tidyposterior: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> </ul> </div> <p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p> scales 1.3.0 https://www.tidyverse.org/blog/2023/11/scales-1-3-0/ Mon, 27 Nov 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/11/scales-1-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re delighted to announce the release of <a href="https://scales.r-lib.org" target="_blank" rel="noopener">scales</a> 1.3.0. scales is a packages that extracts much of the scaling logic that is used in ggplot2 to a general framework, along with utility functions for e.g. formatting labels or creating color palettes.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"scales"</span><span class='o'>)</span> </code></pre> </div> <p>This blog post will give a quick overview of the 1.3.0 release, which is mainly an upkeep release but does contain a few interesting tidbits.</p> <p>You can see a full list of changes in the <a href="https://scales.r-lib.org/news/index.html" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://scales.r-lib.org'>scales</a></span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span> </code></pre> </div> <h2 id="proper-support-for-difftime-objects">Proper support for difftime objects <a href="#proper-support-for-difftime-objects"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While scales had rudimentary support for objects from the hms package, I did not support the more common base R difftime objects. This is now rectified with the introduction of <a href="https://scales.r-lib.org/reference/label_date.html" target="_blank" rel="noopener"><code>label_timespan()</code></a>, <a href="https://scales.r-lib.org/reference/breaks_timespan.html" target="_blank" rel="noopener"><code>breaks_timespan()</code></a>, and <a href="https://scales.r-lib.org/reference/transform_timespan.html" target="_blank" rel="noopener"><code>transform_timespan()</code></a>. While the labels and breaks function can be used on their own, all the behavior is encapsulated in the timespan transform object which is kin to <a href="https://scales.r-lib.org/reference/transform_timespan.html" target="_blank" rel="noopener"><code>transform_hms()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='nv'>events</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/difftime.html'>as.difftime</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>30</span>, max <span class='o'>=</span> <span class='m'>200</span><span class='o'>)</span>, units <span class='o'>=</span> <span class='s'>"secs"</span><span class='o'>)</span>, magnitude <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>30</span><span class='o'>)</span> <span class='o'>+</span> <span class='m'>2</span> <span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>events</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>time</span>, y <span class='o'>=</span> <span class='m'>0</span>, size <span class='o'>=</span> <span class='nv'>magnitude</span><span class='o'>)</span>, position <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/position_jitter.html'>position_jitter</a></span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span>trans <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/transform_timespan.html'>transform_timespan</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As we can see the timespan transform automatically picks the unit of the difftime object. Further it identifies that for this range, adding breaks for minutes makes most sense.</p> <p>If we had recorded time as hours rather than seconds, we can see how that affects the labelling:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>events</span><span class='o'>$</span><span class='nv'>time</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/difftime.html'>as.difftime</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>30</span>, max <span class='o'>=</span> <span class='m'>200</span><span class='o'>)</span>, units <span class='o'>=</span> <span class='s'>"hours"</span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>events</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>time</span>, y <span class='o'>=</span> <span class='m'>0</span>, size <span class='o'>=</span> <span class='nv'>magnitude</span><span class='o'>)</span>, position <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/position_jitter.html'>position_jitter</a></span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span>trans <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/transform_timespan.html'>transform_timespan</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="api-brush-up">API brush-up <a href="#api-brush-up"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>scales has gone through a number of touch-ups on its API, such as revamping the labels functions to all start with <code>label_</code>. This release we continue (and hopefully conclude) the touch-ups by using a common prefix for the transformation utilities (<code>transform_</code>) and palettes (<code>pal_</code>). We have also rename <a href="https://scales.r-lib.org/reference/dollar_format.html" target="_blank" rel="noopener"><code>label_dollar()</code></a> to <a href="https://scales.r-lib.org/reference/label_currency.html" target="_blank" rel="noopener"><code>label_currency()</code></a> to make it clear that this can be used for any type of currency, not just dollars (US or otherwise). All the old functions have been kept around with no plan of deprecation but we advise you to update your code to use the new names.</p> <h2 id="more-transformation-power">More transformation power <a href="#more-transformation-power"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This release also includes some other updates to the transformations. They have received a fair amount of bug fixes and a new built-in transformation type has joined the group: <a href="https://scales.r-lib.org/reference/transform_asinh.html" target="_blank" rel="noopener"><code>transform_asinh()</code></a>, the inverse hyperbolic sine transformation, can be used much like log transformations, but it also supports negative values.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/graphics/plot.default.html'>plot</a></span><span class='o'>(</span><span class='nf'><a href='https://scales.r-lib.org/reference/transform_asinh.html'>transform_asinh</a></span><span class='o'>(</span><span class='o'>)</span>, xlim <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>100</span>, <span class='m'>100</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/graphics/lines.html'>lines</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>100</span>, <span class='m'>100</span><span class='o'>)</span>, <span class='nf'><a href='https://scales.r-lib.org/reference/transform_log.html'>transform_log</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>$</span><span class='nf'>transform</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>100</span>, <span class='m'>100</span><span class='o'>)</span><span class='o'>)</span>, col <span class='o'>=</span> <span class='s'>"red"</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/graphics/text.html'>text</a></span><span class='o'>(</span><span class='m'>50</span>, <span class='m'>3</span>, label <span class='o'>=</span> <span class='s'>"log-transform"</span>, col <span class='o'>=</span> <span class='s'>"red"</span>, adj <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Transformation objects can now also (optionally) record the derivatives and inverse derivative which makes it possible to properly correct density estimations of transformed values.</p> <h2 id="fixes-to-range-training-in-discrete-scales">Fixes to range training in discrete scales <a href="#fixes-to-range-training-in-discrete-scales"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The training of discrete ranges has seen a few changes that hopefully makes it more predictable what happens when you train a range based on factors or character vectors. When training based on factors the ordering of the range will follow the order of the levels in the factor as they are encountered. New values will be appended to the end of the range. For character vectors the range will always stay sorted alphanumerically. Mixing of character and factors during training will lead to undefined ordering. This has always been the advertised behavior but it was not applied consistently up until now. As a result you may see the occational reordering of e.g. legends in ggplot2 after upgrading scales.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://github.com/AndreeWarby" target="_blank" rel="noopener">@AndreeWarby</a>, <a href="https://github.com/ari-nz" target="_blank" rel="noopener">@ari-nz</a>, <a href="https://github.com/BioinformaNicks" target="_blank" rel="noopener">@BioinformaNicks</a>, <a href="https://github.com/bwiernik" target="_blank" rel="noopener">@bwiernik</a>, <a href="https://github.com/ccsarapas" target="_blank" rel="noopener">@ccsarapas</a>, <a href="https://github.com/CMKnott" target="_blank" rel="noopener">@CMKnott</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/davidhodge931" target="_blank" rel="noopener">@davidhodge931</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dmurdoch" target="_blank" rel="noopener">@dmurdoch</a>, <a href="https://github.com/EricMarcon" target="_blank" rel="noopener">@EricMarcon</a>, <a href="https://github.com/Generalized" target="_blank" rel="noopener">@Generalized</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/JJHelly" target="_blank" rel="noopener">@JJHelly</a>, <a href="https://github.com/joshuaylevy" target="_blank" rel="noopener">@joshuaylevy</a>, <a href="https://github.com/jzadra" target="_blank" rel="noopener">@jzadra</a>, <a href="https://github.com/kuriwaki" target="_blank" rel="noopener">@kuriwaki</a>, <a href="https://github.com/larmarange" target="_blank" rel="noopener">@larmarange</a>, <a href="https://github.com/laurejo1" target="_blank" rel="noopener">@laurejo1</a>, <a href="https://github.com/lz1nwm" target="_blank" rel="noopener">@lz1nwm</a>, <a href="https://github.com/MikkoVihtakari" target="_blank" rel="noopener">@MikkoVihtakari</a>, <a href="https://github.com/mjskay" target="_blank" rel="noopener">@mjskay</a>, <a href="https://github.com/pearsonca" target="_blank" rel="noopener">@pearsonca</a>, <a href="https://github.com/Saadi4469" target="_blank" rel="noopener">@Saadi4469</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, and <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>.</p> httr2 1.0.0 https://www.tidyverse.org/blog/2023/11/httr2-1-0-0/ Tue, 14 Nov 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/11/httr2-1-0-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re delighted to announce the release of <a href="https://httr2.r-lib.org" target="_blank" rel="noopener">httr2</a><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> 1.0.0. httr2 is the second generation of httr: it helps you generate HTTP requests and process the responses, designed with an eye towards modern web APIs and potentially putting your code in a package.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"httr2"</span><span class='o'>)</span></span></code></pre> </div> <p>httr2 has been under development for the last two years, but this is the first time we&rsquo;ve blogged about it because we&rsquo;ve been waiting until the user interface felt stable. It now does, and we&rsquo;re ready to encourage you to use httr2 whenever you need to talk to a web server. Most importantly httr2 is now a &ldquo;real&rdquo; package because it has a wonderful new logo, thanks to a collaborative effort involving Julie Jung, Greg Swineheart, and DALL•E 3.</p> <div class="highlight"> <p><img src="httr2.png" alt="The new httr2 logo is a dark blue hexagon with httr2 written in bright white at the top of logo. Underneath the text is a vibrant magenta baseball player hitting a ball emblazoned with the letters &quot;www&quot;." width="200px" style="display: block; margin: auto;" /></p> </div> <p>httr2 is the successor to httr. The biggest difference is that it has an explicit request object which you can build up over multiple function calls. This makes the interface fit more naturally with the pipe, and generally makes life easier because you can iteratively build up a complex request. httr2 also builds on the 10 years of package development experience we&rsquo;ve accrued since creating httr, so it should all around be more enjoyable to use. If you&rsquo;re a current httr user, there&rsquo;s no need to switch, as we&rsquo;ll continue to maintain the package for many years to come, but if you start on a new project, I&rsquo;d recommend that you give httr2 a shot.</p> <p>If you&rsquo;ve been following httr2 development for a while, you might want to jump to the <a href="https://github.com/r-lib/httr2/releases/tag/v1.0.0" target="_blank" rel="noopener">release notes</a> to see what&rsquo;s new (a lot!). The most important change in this release is that <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a> is now a httr2 author, in recognition of his many contributions to the package. This release also features improved tools for performing multiple requests (more on that below) and a bunch of bug fixes and minor improvements for OAuth.</p> <p>For the rest of this blog post, I&rsquo;ll assume that you&rsquo;re familiar with the basics of HTTP. If you&rsquo;re not, you might want to start with <code>vignette(&quot;httr2&quot;)</code> which introduces you to HTTP using httr2.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://httr2.r-lib.org'>httr2</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="making-a-request">Making a request <a href="#making-a-request"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>httr2 is designed around the two big pieces of HTTP: requests and responses. First you&rsquo;ll create a request, with a URL:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/request.html'>request</a></span><span class='o'>(</span><span class='nf'><a href='https://httr2.r-lib.org/reference/example_url.html'>example_url</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>req</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>&lt;httr2_request&gt;</span></span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>GET</span> http://127.0.0.1:51981/</span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>Body</span>: empty</span></span> <span></span></code></pre> </div> <p>Instead of using an external website, here we&rsquo;re using a test server that&rsquo;s built in to httr2. This ensures that this blog post, and many httr2 examples, work independently from the rest of the internet.</p> <p>You can see the HTTP request that httr2 will send, without actually sending it<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>, by doing a dry run:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_dry_run.html'>req_dry_run</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; GET / HTTP/1.1</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Host</span>: 127.0.0.1:51981</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>User-Agent</span>: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept</span>: */*</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept-Encoding</span>: deflate, gzip</span></span> <span></span></code></pre> </div> <p>As you can see, this request object will perform a simple <code>GET</code> request with automatic user agent and accept headers.</p> <p>To make more complex requests, you modify the request object with functions that start with <code>req_</code>. For example, you could make it a <code>HEAD</code> request, with some query parameters, and a custom user agent:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_url.html'>req_url_query</a></span><span class='o'>(</span>param <span class='o'>=</span> <span class='s'>"value"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_user_agent.html'>req_user_agent</a></span><span class='o'>(</span><span class='s'>"My user agent"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_method.html'>req_method</a></span><span class='o'>(</span><span class='s'>"HEAD"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_dry_run.html'>req_dry_run</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; HEAD /?param=value HTTP/1.1</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Host</span>: 127.0.0.1:51981</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>User-Agent</span>: My user agent</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept</span>: */*</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept-Encoding</span>: deflate, gzip</span></span> <span></span></code></pre> </div> <p>Or you could send some JSON in the body of the request:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_body.html'>req_body_json</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='s'>"a"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_dry_run.html'>req_dry_run</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; POST / HTTP/1.1</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Host</span>: 127.0.0.1:51981</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>User-Agent</span>: httr2/0.2.3.9000 r-curl/5.1.0 libcurl/8.1.2</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept</span>: */*</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Accept-Encoding</span>: deflate, gzip</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Content-Type</span>: application/json</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Content-Length</span>: 15</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; &#123;"x":1,"y":"a"&#125;</span></span> <span></span></code></pre> </div> <p>httr2 provides a <a href="https://httr2.r-lib.org/dev/reference/index.html#requests" target="_blank" rel="noopener">wide range of <code>req_</code> function</a> to customise the request in common ways; if there&rsquo;s something you need that httr2 doesn&rsquo;t support, please <a href="https://github.com/r-lib/httr2/issues/new" target="_blank" rel="noopener">file an issue</a>!</p> <h2 id="performing-the-request-and-handling-the-response">Performing the request and handling the response <a href="#performing-the-request-and-handling-the-response"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Once you have a request that you are happy with, you can send it to the server with <a href="https://httr2.r-lib.org/reference/req_perform.html" target="_blank" rel="noopener"><code>req_perform()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req_json</span> <span class='o'>&lt;-</span> <span class='nv'>req</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_url.html'>req_url_path</a></span><span class='o'>(</span><span class='s'>"/json"</span><span class='o'>)</span></span> <span><span class='nv'>resp</span> <span class='o'>&lt;-</span> <span class='nv'>req_json</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_perform.html'>req_perform</a></span><span class='o'>(</span><span class='o'>)</span></span></code></pre> </div> <p>Performing a request will return a response object (or throw an error, which we&rsquo;ll talk about next). You can see the basic details of the request by printing it or you can see the raw response with <a href="https://httr2.r-lib.org/reference/resp_raw.html" target="_blank" rel="noopener"><code>resp_raw()</code></a><sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resp</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>&lt;httr2_response&gt;</span></span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>GET</span> http://127.0.0.1:51981/json</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Status</span>: 200 OK</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Content-Type</span>: application/json</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Body</span>: In memory (407 bytes)</span></span> <span></span><span></span> <span><span class='nv'>resp</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_raw.html'>resp_raw</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; HTTP/1.1 200 OK</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Connection</span>: close</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Date</span>: Tue, 14 Nov 2023 14:41:32 GMT</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Content-Type</span>: application/json</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Content-Length</span>: 407</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>ETag</span>: "de760e6d"</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; &#123;</span></span> <span><span class='c'>#&gt; "firstName": "John",</span></span> <span><span class='c'>#&gt; "lastName": "Smith",</span></span> <span><span class='c'>#&gt; "isAlive": true,</span></span> <span><span class='c'>#&gt; "age": 27,</span></span> <span><span class='c'>#&gt; "address": &#123;</span></span> <span><span class='c'>#&gt; "streetAddress": "21 2nd Street",</span></span> <span><span class='c'>#&gt; "city": "New York",</span></span> <span><span class='c'>#&gt; "state": "NY",</span></span> <span><span class='c'>#&gt; "postalCode": "10021-3100"</span></span> <span><span class='c'>#&gt; &#125;,</span></span> <span><span class='c'>#&gt; "phoneNumbers": [</span></span> <span><span class='c'>#&gt; &#123;</span></span> <span><span class='c'>#&gt; "type": "home",</span></span> <span><span class='c'>#&gt; "number": "212 555-1234"</span></span> <span><span class='c'>#&gt; &#125;,</span></span> <span><span class='c'>#&gt; &#123;</span></span> <span><span class='c'>#&gt; "type": "office",</span></span> <span><span class='c'>#&gt; "number": "646 555-4567"</span></span> <span><span class='c'>#&gt; &#125;</span></span> <span><span class='c'>#&gt; ],</span></span> <span><span class='c'>#&gt; "children": [],</span></span> <span><span class='c'>#&gt; "spouse": null</span></span> <span><span class='c'>#&gt; &#125;</span></span> <span></span></code></pre> </div> <p>But generally, you&rsquo;ll want to use the <code>resp_</code> functions to extract parts of the response for further processing. For example, you could parse the JSON body into an R data structure:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resp</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_body_raw.html'>resp_body_json</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 8</span></span> <span><span class='c'>#&gt; $ firstName : chr "John"</span></span> <span><span class='c'>#&gt; $ lastName : chr "Smith"</span></span> <span><span class='c'>#&gt; $ isAlive : logi TRUE</span></span> <span><span class='c'>#&gt; $ age : int 27</span></span> <span><span class='c'>#&gt; $ address :List of 4</span></span> <span><span class='c'>#&gt; ..$ streetAddress: chr "21 2nd Street"</span></span> <span><span class='c'>#&gt; ..$ city : chr "New York"</span></span> <span><span class='c'>#&gt; ..$ state : chr "NY"</span></span> <span><span class='c'>#&gt; ..$ postalCode : chr "10021-3100"</span></span> <span><span class='c'>#&gt; $ phoneNumbers:List of 2</span></span> <span><span class='c'>#&gt; ..$ :List of 2</span></span> <span><span class='c'>#&gt; .. ..$ type : chr "home"</span></span> <span><span class='c'>#&gt; .. ..$ number: chr "212 555-1234"</span></span> <span><span class='c'>#&gt; ..$ :List of 2</span></span> <span><span class='c'>#&gt; .. ..$ type : chr "office"</span></span> <span><span class='c'>#&gt; .. ..$ number: chr "646 555-4567"</span></span> <span><span class='c'>#&gt; $ children : list()</span></span> <span><span class='c'>#&gt; $ spouse : NULL</span></span> <span></span></code></pre> </div> <p>Or get the value of a header:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resp</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_headers.html'>resp_header</a></span><span class='o'>(</span><span class='s'>"Content-Length"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "407"</span></span> <span></span></code></pre> </div> <h2 id="error-handling">Error handling <a href="#error-handling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You can use <a href="https://httr2.r-lib.org/reference/resp_status.html" target="_blank" rel="noopener"><code>resp_status()</code></a> to see the returned status:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resp</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_status.html'>resp_status</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 200</span></span> <span></span></code></pre> </div> <p>But this will almost always be 200, because httr2 automatically follows redirects (statuses in the 300s) and turns HTTP failures (statuses in the 400s and 500s) into R errors. The following example shows what error handling looks like using an example endpoint that returns a response with the status defined in the URL:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>req</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_url.html'>req_url_path</a></span><span class='o'>(</span><span class='s'>"/status/404"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_perform.html'>req_perform</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `req_perform()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> HTTP 404 Not Found.</span></span> <span></span><span></span> <span><span class='nv'>req</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_url.html'>req_url_path</a></span><span class='o'>(</span><span class='s'>"/status/500"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_perform.html'>req_perform</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `req_perform()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> HTTP 500 Internal Server Error.</span></span> <span></span></code></pre> </div> <p>Turning HTTP failures into R errors can make debugging hard, so httr2 provides the <a href="https://httr2.r-lib.org/reference/last_response.html" target="_blank" rel="noopener"><code>last_request()</code></a> and <a href="https://httr2.r-lib.org/reference/last_response.html" target="_blank" rel="noopener"><code>last_response()</code></a> helpers which you can use to figure out what went wrong:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://httr2.r-lib.org/reference/last_response.html'>last_request</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>&lt;httr2_request&gt;</span></span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>GET</span> http://127.0.0.1:51981/status/500</span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>Body</span>: empty</span></span> <span></span><span></span> <span><span class='nf'><a href='https://httr2.r-lib.org/reference/last_response.html'>last_response</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>&lt;httr2_response&gt;</span></span></span> <span></span><span><span class='c'>#&gt; <span style='font-weight: bold;'>GET</span> http://127.0.0.1:51981/status/500</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Status</span>: 500 Internal Server Error</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Content-Type</span>: text/plain</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>Body</span>: None</span></span> <span></span></code></pre> </div> <p>httr2 provides two other tools to customise error handling:</p> <ul> <li> <a href="https://httr2.r-lib.org/reference/req_error.html" target="_blank" rel="noopener"><code>req_error()</code></a> gives you full control over what responses should be turned into R errors, and allows you to add additional information to the error message.</li> <li> <a href="https://httr2.r-lib.org/reference/req_retry.html" target="_blank" rel="noopener"><code>req_retry()</code></a> helps deal with transient errors, where you need to wait a bit and try again. For example, many APIs are rate limited and will return a 429 status if you have made too many requests.</li> </ul> <p>You can learn more about both of these functions in &ldquo; <a href="https://httr2.r-lib.org/articles/wrapping-apis.html" target="_blank" rel="noopener">Wrapping APIs</a>&rdquo; as they are particularly important when creating an R package (or script) that wraps a web API.</p> <h2 id="control-the-request-process">Control the request process <a href="#control-the-request-process"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are a number of other <code>req_</code> functions that don&rsquo;t directly affect the HTTP request but instead control the overall process of submitting a request and handling the response. These include:</p> <ul> <li> <p> <a href="https://httr2.r-lib.org/reference/req_cache.html" target="_blank" rel="noopener"><code>req_cache()</code></a>, which sets up a cache so if repeated requests return the same results, and you can avoid a trip to the server.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_throttle.html" target="_blank" rel="noopener"><code>req_throttle()</code></a>, which automatically adds a small delay before each request so you can avoid hammering a server with many requests.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_progress.html" target="_blank" rel="noopener"><code>req_progress()</code></a>, which adds a progress bar for long downloads or uploads.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_cookie_preserve.html" target="_blank" rel="noopener"><code>req_cookie_preserve()</code></a>, which lets you preserve cookies across requests.</p> </li> </ul> <p>Additionally, httr2 provides rich support for authenticating with OAuth, implementing many more OAuth flows than httr. You&rsquo;ve probably used OAuth a bunch without knowing what it&rsquo;s called: you use it when you login to a non-Google website using your Google account, when you give your phone access to your twitter account, or when you login to a streaming app on your smart TV. OAuth is a big, complex, topic, and is documented in &ldquo; <a href="https://httr2.r-lib.org/articles/oauth.html" target="_blank" rel="noopener">OAuth</a>&quot;.</p> <h2 id="multiple-requests">Multiple requests <a href="#multiple-requests"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>httr2 includes three functions to perform multiple requests:</p> <ul> <li> <p> <a href="https://httr2.r-lib.org/reference/req_perform_sequential.html" target="_blank" rel="noopener"><code>req_perform_sequential()</code></a> takes a list of requests and performs them one at a time.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_perform_parallel.html" target="_blank" rel="noopener"><code>req_perform_parallel()</code></a> takes a list of requests and performs them in parallel (up to 6 at a time by default). It&rsquo;s similar to <a href="https://httr2.r-lib.org/reference/req_perform_sequential.html" target="_blank" rel="noopener"><code>req_perform_sequential()</code></a>, but is obviously faster, at the expense of potentially hammering a server. It also has some limitations: most importantly it can&rsquo;t refresh an expired OAuth token and it doesn&rsquo;t respect <a href="https://httr2.r-lib.org/reference/req_retry.html" target="_blank" rel="noopener"><code>req_retry()</code></a> or <a href="https://httr2.r-lib.org/reference/req_throttle.html" target="_blank" rel="noopener"><code>req_throttle()</code></a>.</p> </li> <li> <p> <a href="https://httr2.r-lib.org/reference/req_perform_iterative.html" target="_blank" rel="noopener"><code>req_perform_iterative()</code></a> takes a single request and a callback function to generate the next request from previous response. It&rsquo;ll keep going until the callback function returns <code>NULL</code> or <code>max_reqs</code> requests have been performed. This is very useful for paginated APIs that only tell you the URL for the <em>next</em> page.</p> </li> </ul> <p>For example, imagine we wanted to download each person from the <a href="https://swapi.dev" target="_blank" rel="noopener">Star Wars API</a>. The URLs have a very consistent structure so we can generate a bunch of them, then create the corresponding requests:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>urls</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='s'>"https://swapi.dev/api/people/"</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>10</span><span class='o'>)</span></span> <span><span class='nv'>reqs</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/lapply.html'>lapply</a></span><span class='o'>(</span><span class='nv'>urls</span>, <span class='nv'>request</span><span class='o'>)</span></span></code></pre> </div> <p>Now I can perform those requests, collecting a list of responses:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resps</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/req_perform_sequential.html'>req_perform_sequential</a></span><span class='o'>(</span><span class='nv'>reqs</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■ </span> 10% | ETA: 40s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■ </span> 20% | ETA: 3m</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■ </span> 30% | ETA: 2m</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■ </span> 40% | ETA: 1m</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■ </span> 50% | ETA: 46s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■■■■ </span> 60% | ETA: 33s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■■■■■■■ </span> 70% | ETA: 22s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■■■■■■■■■■ </span> 80% | ETA: 13s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■■■■■■■■■■■■■ </span> 90% | ETA: 6s</span></span> <span></span><span><span class='c'>#&gt; Iterating <span style='color: #00BB00;'>■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ </span> 100% | ETA: 0s</span></span> <span></span></code></pre> </div> <p>These responses contain their data in a JSON body:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resps</span> <span class='o'>|&gt;</span> </span> <span> <span class='nv'>_</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resp_body_raw.html'>resp_body_json</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 16</span></span> <span><span class='c'>#&gt; $ name : chr "Luke Skywalker"</span></span> <span><span class='c'>#&gt; $ height : chr "172"</span></span> <span><span class='c'>#&gt; $ mass : chr "77"</span></span> <span><span class='c'>#&gt; $ hair_color: chr "blond"</span></span> <span><span class='c'>#&gt; $ skin_color: chr "fair"</span></span> <span><span class='c'>#&gt; $ eye_color : chr "blue"</span></span> <span><span class='c'>#&gt; $ birth_year: chr "19BBY"</span></span> <span><span class='c'>#&gt; $ gender : chr "male"</span></span> <span><span class='c'>#&gt; $ homeworld : chr "https://swapi.dev/api/planets/1/"</span></span> <span><span class='c'>#&gt; $ films :List of 4</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/films/1/"</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/films/2/"</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/films/3/"</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/films/6/"</span></span> <span><span class='c'>#&gt; $ species : list()</span></span> <span><span class='c'>#&gt; $ vehicles :List of 2</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/vehicles/14/"</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/vehicles/30/"</span></span> <span><span class='c'>#&gt; $ starships :List of 2</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/starships/12/"</span></span> <span><span class='c'>#&gt; ..$ : chr "https://swapi.dev/api/starships/22/"</span></span> <span><span class='c'>#&gt; $ created : chr "2014-12-09T13:50:51.644000Z"</span></span> <span><span class='c'>#&gt; $ edited : chr "2014-12-20T21:17:56.891000Z"</span></span> <span><span class='c'>#&gt; $ url : chr "https://swapi.dev/api/people/1/"</span></span> <span></span></code></pre> </div> <p>There&rsquo;s lots of ways to deal with this sort of data (e.g. for loops or functional programming) but to make life easier, httr2 comes with its own helper, <a href="https://httr2.r-lib.org/reference/resps_successes.html" target="_blank" rel="noopener"><code>resps_data()</code></a>. This function takes a callback that retrieves the data for each response, then concatenates all the data into a single object. In this case, we need to wrap <a href="https://httr2.r-lib.org/reference/resp_body_raw.html" target="_blank" rel="noopener"><code>resp_body_json()</code></a> in a list, so we get one list for each person, rather than one list in total:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resps</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resps_successes.html'>resps_data</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>resp</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://httr2.r-lib.org/reference/resp_body_raw.html'>resp_body_json</a></span><span class='o'>(</span><span class='nv'>resp</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nv'>_</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span><span class='o'>]</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span>list.len <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 3</span></span> <span><span class='c'>#&gt; $ :List of 16</span></span> <span><span class='c'>#&gt; ..$ name : chr "Luke Skywalker"</span></span> <span><span class='c'>#&gt; ..$ height : chr "172"</span></span> <span><span class='c'>#&gt; ..$ mass : chr "77"</span></span> <span><span class='c'>#&gt; ..$ hair_color: chr "blond"</span></span> <span><span class='c'>#&gt; ..$ skin_color: chr "fair"</span></span> <span><span class='c'>#&gt; ..$ eye_color : chr "blue"</span></span> <span><span class='c'>#&gt; ..$ birth_year: chr "19BBY"</span></span> <span><span class='c'>#&gt; ..$ gender : chr "male"</span></span> <span><span class='c'>#&gt; ..$ homeworld : chr "https://swapi.dev/api/planets/1/"</span></span> <span><span class='c'>#&gt; ..$ films :List of 4</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/1/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/2/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/3/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/6/"</span></span> <span><span class='c'>#&gt; .. [list output truncated]</span></span> <span><span class='c'>#&gt; $ :List of 16</span></span> <span><span class='c'>#&gt; ..$ name : chr "C-3PO"</span></span> <span><span class='c'>#&gt; ..$ height : chr "167"</span></span> <span><span class='c'>#&gt; ..$ mass : chr "75"</span></span> <span><span class='c'>#&gt; ..$ hair_color: chr "n/a"</span></span> <span><span class='c'>#&gt; ..$ skin_color: chr "gold"</span></span> <span><span class='c'>#&gt; ..$ eye_color : chr "yellow"</span></span> <span><span class='c'>#&gt; ..$ birth_year: chr "112BBY"</span></span> <span><span class='c'>#&gt; ..$ gender : chr "n/a"</span></span> <span><span class='c'>#&gt; ..$ homeworld : chr "https://swapi.dev/api/planets/1/"</span></span> <span><span class='c'>#&gt; ..$ films :List of 6</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/1/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/2/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/3/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/4/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/5/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/6/"</span></span> <span><span class='c'>#&gt; .. [list output truncated]</span></span> <span><span class='c'>#&gt; $ :List of 16</span></span> <span><span class='c'>#&gt; ..$ name : chr "R2-D2"</span></span> <span><span class='c'>#&gt; ..$ height : chr "96"</span></span> <span><span class='c'>#&gt; ..$ mass : chr "32"</span></span> <span><span class='c'>#&gt; ..$ hair_color: chr "n/a"</span></span> <span><span class='c'>#&gt; ..$ skin_color: chr "white, blue"</span></span> <span><span class='c'>#&gt; ..$ eye_color : chr "red"</span></span> <span><span class='c'>#&gt; ..$ birth_year: chr "33BBY"</span></span> <span><span class='c'>#&gt; ..$ gender : chr "n/a"</span></span> <span><span class='c'>#&gt; ..$ homeworld : chr "https://swapi.dev/api/planets/8/"</span></span> <span><span class='c'>#&gt; ..$ films :List of 6</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/1/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/2/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/3/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/4/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/5/"</span></span> <span><span class='c'>#&gt; .. ..$ : chr "https://swapi.dev/api/films/6/"</span></span> <span><span class='c'>#&gt; .. [list output truncated]</span></span> <span></span></code></pre> </div> <p>Another option would be to convert each response into a data frame or tibble. That&rsquo;s a little tricky here because of the nested lists that will need to become list-columns<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>, so we&rsquo;ll avoid that challenge here by focussing on the first nine columns:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>sw_data</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>resp</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'>tibble</span><span class='nf'>::</span><span class='nf'><a href='https://tibble.tidyverse.org/reference/as_tibble.html'>as_tibble</a></span><span class='o'>(</span><span class='nf'><a href='https://httr2.r-lib.org/reference/resp_body_raw.html'>resp_body_json</a></span><span class='o'>(</span><span class='nv'>resp</span><span class='o'>)</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>9</span><span class='o'>]</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span><span class='nv'>resps</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://httr2.r-lib.org/reference/resps_successes.html'>resps_data</a></span><span class='o'>(</span><span class='nv'>sw_data</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10 × 9</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>name</span> <span style='font-weight: bold;'>height</span> <span style='font-weight: bold;'>mass</span> <span style='font-weight: bold;'>hair_color</span> <span style='font-weight: bold;'>skin_color</span> <span style='font-weight: bold;'>eye_color</span> <span style='font-weight: bold;'>birth_year</span> <span style='font-weight: bold;'>gender</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Luke Skywalker 172 77 blond fair blue 19BBY male </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> C-3PO 167 75 n/a gold yellow 112BBY n/a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> R2-D2 96 32 n/a white, bl… red 33BBY n/a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Darth Vader 202 136 none white yellow 41.9BBY male </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Leia Organa 150 49 brown light brown 19BBY female</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Owen Lars 178 120 brown, gr… light blue 52BBY male </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Beru Whitesun… 165 75 brown light blue 47BBY female</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> R5-D4 97 32 n/a white, red red unknown n/a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Biggs Darklig… 183 84 black light brown 24BBY male </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Obi-Wan Kenobi 182 77 auburn, w… fair blue-gray 57BBY male </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: </span><span style='color: #555555; font-weight: bold;'>homeworld</span><span style='color: #555555;'> &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <p>When you&rsquo;re performing large numbers of requests, it&rsquo;s almost inevitable that something will go wrong. By default, all three functions will bubble up errors, causing you to lose all of the work that&rsquo;s been done so far. You can, however, use the <code>on_error</code> argument to change what happens, either ignoring errors, or returning when you hit the first error. This will changes the return value: instead of a list of responses, the list might now also contain error objects. httr2 provides other helpers to work with this object:</p> <ul> <li> <a href="https://httr2.r-lib.org/reference/resps_successes.html" target="_blank" rel="noopener"><code>resps_successes()</code></a> filters the list to find the successful responses. You&rsquo;ll can then pair this with <a href="https://httr2.r-lib.org/reference/resps_successes.html" target="_blank" rel="noopener"><code>resps_data()</code></a> to get the data from the successful request.</li> <li> <a href="https://httr2.r-lib.org/reference/resps_successes.html" target="_blank" rel="noopener"><code>resps_failures()</code></a> filters the list to find the failed responses. You&rsquo;ll can then pair this with <a href="https://httr2.r-lib.org/reference/resps_successes.html" target="_blank" rel="noopener"><code>resps_requests()</code></a> to find the requests that generated them and figure out what went wrong,.</li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 87 folks who have helped make httr2 possible!</p> <p> <a href="https://github.com/allenbaron" target="_blank" rel="noopener">@allenbaron</a>, <a href="https://github.com/asadow" target="_blank" rel="noopener">@asadow</a>, <a href="https://github.com/atheriel" target="_blank" rel="noopener">@atheriel</a>, <a href="https://github.com/boshek" target="_blank" rel="noopener">@boshek</a>, <a href="https://github.com/casa-henrym" target="_blank" rel="noopener">@casa-henrym</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/colmanhumphrey" target="_blank" rel="noopener">@colmanhumphrey</a>, <a href="https://github.com/cstjohn810" target="_blank" rel="noopener">@cstjohn810</a>, <a href="https://github.com/cwang23" target="_blank" rel="noopener">@cwang23</a>, <a href="https://github.com/DavidRLovell" target="_blank" rel="noopener">@DavidRLovell</a>, <a href="https://github.com/DMerch" target="_blank" rel="noopener">@DMerch</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/ECOSchulz" target="_blank" rel="noopener">@ECOSchulz</a>, <a href="https://github.com/edavidaja" target="_blank" rel="noopener">@edavidaja</a>, <a href="https://github.com/elipousson" target="_blank" rel="noopener">@elipousson</a>, <a href="https://github.com/emmansh" target="_blank" rel="noopener">@emmansh</a>, <a href="https://github.com/Enchufa2" target="_blank" rel="noopener">@Enchufa2</a>, <a href="https://github.com/ErdaradunGaztea" target="_blank" rel="noopener">@ErdaradunGaztea</a>, <a href="https://github.com/fangzhou-xie" target="_blank" rel="noopener">@fangzhou-xie</a>, <a href="https://github.com/fh-mthomson" target="_blank" rel="noopener">@fh-mthomson</a>, <a href="https://github.com/fkohrt" target="_blank" rel="noopener">@fkohrt</a>, <a href="https://github.com/flahn" target="_blank" rel="noopener">@flahn</a>, <a href="https://github.com/gregleleu" target="_blank" rel="noopener">@gregleleu</a>, <a href="https://github.com/guga31bb" target="_blank" rel="noopener">@guga31bb</a>, <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hongooi73" target="_blank" rel="noopener">@hongooi73</a>, <a href="https://github.com/howardbaek" target="_blank" rel="noopener">@howardbaek</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/JBGruber" target="_blank" rel="noopener">@JBGruber</a>, <a href="https://github.com/jchrom" target="_blank" rel="noopener">@jchrom</a>, <a href="https://github.com/jemus42" target="_blank" rel="noopener">@jemus42</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jimrothstein" target="_blank" rel="noopener">@jimrothstein</a>, <a href="https://github.com/jjesusfilho" target="_blank" rel="noopener">@jjesusfilho</a>, <a href="https://github.com/jjfantini" target="_blank" rel="noopener">@jjfantini</a>, <a href="https://github.com/jl5000" target="_blank" rel="noopener">@jl5000</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/judith-bourque" target="_blank" rel="noopener">@judith-bourque</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/kasperwelbers" target="_blank" rel="noopener">@kasperwelbers</a>, <a href="https://github.com/kelvindso" target="_blank" rel="noopener">@kelvindso</a>, <a href="https://github.com/kieran-mace" target="_blank" rel="noopener">@kieran-mace</a>, <a href="https://github.com/KoderKow" target="_blank" rel="noopener">@KoderKow</a>, <a href="https://github.com/lassehjorthmadsen" target="_blank" rel="noopener">@lassehjorthmadsen</a>, <a href="https://github.com/llrs" target="_blank" rel="noopener">@llrs</a>, <a href="https://github.com/lyndon-bird" target="_blank" rel="noopener">@lyndon-bird</a>, <a href="https://github.com/m-mohr" target="_blank" rel="noopener">@m-mohr</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/michaelgfalk" target="_blank" rel="noopener">@michaelgfalk</a>, <a href="https://github.com/misea" target="_blank" rel="noopener">@misea</a>, <a href="https://github.com/MislavSag" target="_blank" rel="noopener">@MislavSag</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/mmuurr" target="_blank" rel="noopener">@mmuurr</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/nbenn" target="_blank" rel="noopener">@nbenn</a>, <a href="https://github.com/nclsbarreto" target="_blank" rel="noopener">@nclsbarreto</a>, <a href="https://github.com/nealrichardson" target="_blank" rel="noopener">@nealrichardson</a>, <a href="https://github.com/Nelson-Gon" target="_blank" rel="noopener">@Nelson-Gon</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/owenjonesuob" target="_blank" rel="noopener">@owenjonesuob</a>, <a href="https://github.com/paul-carteron" target="_blank" rel="noopener">@paul-carteron</a>, <a href="https://github.com/pbulsink" target="_blank" rel="noopener">@pbulsink</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/rplati" target="_blank" rel="noopener">@rplati</a>, <a href="https://github.com/rressler" target="_blank" rel="noopener">@rressler</a>, <a href="https://github.com/samterfa" target="_blank" rel="noopener">@samterfa</a>, <a href="https://github.com/schnee" target="_blank" rel="noopener">@schnee</a>, <a href="https://github.com/sckott" target="_blank" rel="noopener">@sckott</a>, <a href="https://github.com/sebastian-c" target="_blank" rel="noopener">@sebastian-c</a>, <a href="https://github.com/selesnow" target="_blank" rel="noopener">@selesnow</a>, <a href="https://github.com/Shaunson26" target="_blank" rel="noopener">@Shaunson26</a>, <a href="https://github.com/SokolovAnatoliy" target="_blank" rel="noopener">@SokolovAnatoliy</a>, <a href="https://github.com/spotrh" target="_blank" rel="noopener">@spotrh</a>, <a href="https://github.com/stefanedwards" target="_blank" rel="noopener">@stefanedwards</a>, <a href="https://github.com/taerwin" target="_blank" rel="noopener">@taerwin</a>, <a href="https://github.com/vanhry" target="_blank" rel="noopener">@vanhry</a>, <a href="https://github.com/wing328" target="_blank" rel="noopener">@wing328</a>, <a href="https://github.com/xinzhuohkust" target="_blank" rel="noopener">@xinzhuohkust</a>, <a href="https://github.com/yogat3ch" target="_blank" rel="noopener">@yogat3ch</a>, <a href="https://github.com/yogesh-bansal" target="_blank" rel="noopener">@yogesh-bansal</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zacdav-db" target="_blank" rel="noopener">@zacdav-db</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>Pronounced &ldquo;hitter 2&rdquo;. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Well, technically, it does send the request, just to another test server that returns the request that it received. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>This is only an approximation. For example, it only shows the final response if there were redirects, and it automatically uncompresses the body if it was compressed. Nevertheless, it&rsquo;s still pretty useful. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>To turn these into list-columns, you need to wrap each list in another list, something like <code>is_list &lt;- map_lgl(json, is.list); json[is_list] &lt;- map(json[is_list], list)</code>. This ensures that each element has length 1, the invariant for a row in a tibble. <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Three ways errors are about to get better in tidymodels https://www.tidyverse.org/blog/2023/11/tidymodels-errors-q4/ Fri, 10 Nov 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/11/tidymodels-errors-q4/ <p>Twice a year, the tidymodels team comes together for &ldquo;spring cleaning,&rdquo; a week-long project devoted to package maintenance. Ahead of the week, we come up with a list of maintenance tasks that we&rsquo;d like to see consistently implemented across our packages. Many of these tasks can be completed by running one usethis function, while others are much more involved, like issue triage.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> In tidymodels, triaging issues in our core packages helps us to better understand common ways that users struggle to wrap their heads around an API choice we&rsquo;ve made or find the information they need. So, among other things, refinements to the wording of our error messages is a common output of our spring cleanings. This blog post will call out three kinds of changes to our erroring that came out of this spring cleaning:</p> <ul> <li>Improving existing errors: <a href="#outcome">The outcome went missing</a></li> <li>Do something where we once did nothing: <a href="#predict">Predicting with things that can&rsquo;t predict</a></li> <li>Make a place and point to it: <a href="#model">Model formulas</a></li> </ul> <p>To demonstrate, we&rsquo;ll walk through some examples using the tidymodels packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching packages</span> ──────────────────────────── tidymodels 1.1.1 ──</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>broom </span> 1.0.5 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>recipes </span> 1.0.8.<span style='color: #BB0000;'>9000</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dials </span> 1.2.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>rsample </span> 1.2.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.1.3 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.2.1 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.4.4 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.3.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>infer </span> 1.0.5 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tune </span> 1.1.2.<span style='color: #BB0000;'>9000</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>modeldata </span> 1.2.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflows </span> 1.1.3 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>parsnip </span> 1.1.1.<span style='color: #BB0000;'>9001</span> <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflowsets</span> 1.0.1 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 1.0.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>yardstick </span> 1.2.0</span></span> <span></span><span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ─────────────────────────────── tidymodels_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>purrr</span>::<span style='color: #00BB00;'>discard()</span> masks <span style='color: #0000BB;'>scales</span>::discard()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>recipes</span>::<span style='color: #00BB00;'>step()</span> masks <span style='color: #0000BB;'>stats</span>::step()</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>•</span> Use suppressPackageStartupMessages() to eliminate package startup messages</span></span> <span></span></code></pre> </div> <p>Note that my installed versions include the current dev version of a few tidymodels packages. You can install those versions with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='https://pak.r-lib.org/reference/pak.html'>pak</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='s'>"tidymodels/"</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"tune"</span>, <span class='s'>"parsnip"</span>, <span class='s'>"recipes"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <h2 id="the-outcome-went-missing-">The outcome went missing 👻 <a href="#the-outcome-went-missing-"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The tidymodels packages focus on <em>supervised</em> machine learning problems, predicting the value of an outcome using predictors.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> For example, in the code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>linear_spec</span> <span class='o'>&lt;-</span> <span class='nf'>linear_reg</span><span class='o'>(</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>linear_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>linear_spec</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>hp</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span></code></pre> </div> <p>The <code>mpg</code> variable is the outcome. There are many ways that an analyst may mistakenly fail to pass an outcome. In the most straightforward case, they might omit the outcome on the LHS of the formula:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">fit</span><span class="p">(</span><span class="n">linear_spec</span><span class="p">,</span> <span class="o">~</span> <span class="n">hp</span><span class="p">,</span> <span class="n">mtcars</span><span class="p">)</span> <span class="c1">#&gt; Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : </span> <span class="c1">#&gt; incompatible dimensions</span> </code></pre></div><p>In this case, parsnip used to defer to the modeling engine to raise an error, which may or may not be informative.</p> <p>There are many less obvious ways an analyst may mistakenly supply no outcome variable. For example, try spotting the issue in the following code, defining a recipe to perform principal component analysis (PCA) on the numeric variables in the data before fitting the model:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">mtcars_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">mtcars</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_pca</span><span class="p">(</span><span class="nf">all_numeric</span><span class="p">())</span> <span class="nf">workflow</span><span class="p">(</span><span class="n">mtcars_rec</span><span class="p">,</span> <span class="n">linear_spec</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">fit</span><span class="p">(</span><span class="n">mtcars</span><span class="p">)</span> <span class="c1">#&gt; Error: object &#39;.&#39; not found</span> </code></pre></div><p>A head-scratcher! To help diagnose what&rsquo;s happening here, we could first try seeing what data is actually being passed to the model.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_rec_trained</span> <span class='o'>&lt;-</span></span> <span> <span class='nv'>mtcars_rec</span> <span class='o'>%&gt;%</span> </span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> </span> <span></span> <span><span class='nv'>mtcars_rec_trained</span> <span class='o'>%&gt;%</span> <span class='nf'>bake</span><span class='o'>(</span><span class='kc'>NULL</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 5</span></span></span> <span><span class='c'>#&gt; PC1 PC2 PC3 PC4 PC5</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> -<span style='color: #BB0000;'>195.</span> 12.8 -<span style='color: #BB0000;'>11.4</span> 0.016<span style='text-decoration: underline;'>4</span> 2.17 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> -<span style='color: #BB0000;'>195.</span> 12.9 -<span style='color: #BB0000;'>11.7</span> -<span style='color: #BB0000;'>0.479</span> 2.11 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> -<span style='color: #BB0000;'>142.</span> 25.9 -<span style='color: #BB0000;'>16.0</span> -<span style='color: #BB0000;'>1.34</span> -<span style='color: #BB0000;'>1.18</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> -<span style='color: #BB0000;'>279.</span> -<span style='color: #BB0000;'>38.3</span> -<span style='color: #BB0000;'>14.0</span> 0.157 -<span style='color: #BB0000;'>0.817</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> -<span style='color: #BB0000;'>399.</span> -<span style='color: #BB0000;'>37.3</span> -<span style='color: #BB0000;'>1.38</span> 2.56 -<span style='color: #BB0000;'>0.444</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> -<span style='color: #BB0000;'>248.</span> -<span style='color: #BB0000;'>25.6</span> -<span style='color: #BB0000;'>12.2</span> -<span style='color: #BB0000;'>3.01</span> -<span style='color: #BB0000;'>1.08</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> -<span style='color: #BB0000;'>435.</span> 20.9 13.9 0.801 -<span style='color: #BB0000;'>0.916</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> -<span style='color: #BB0000;'>160.</span> -<span style='color: #BB0000;'>20.0</span> -<span style='color: #BB0000;'>23.3</span> -<span style='color: #BB0000;'>1.06</span> 0.787</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> -<span style='color: #BB0000;'>172.</span> 10.8 -<span style='color: #BB0000;'>18.3</span> -<span style='color: #BB0000;'>4.40</span> -<span style='color: #BB0000;'>0.836</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> -<span style='color: #BB0000;'>209.</span> 19.7 -<span style='color: #BB0000;'>8.94</span> -<span style='color: #BB0000;'>2.58</span> 1.33 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span> <span></span></code></pre> </div> <p>Mmm. What happened to <code>mpg</code>? We mistakenly told <code>step_pca()</code> to perform PCA on <em>all</em> of the numeric variables, not just the numeric <em>predictors</em>! As a result, it incorporated <code>mpg</code> into the principal components, removing each of the original numeric variables after the fact. Rewriting using the correct tidyselect specification <code>all_numeric_predictors()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_rec_new</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>step_pca</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>mtcars_rec_new</span>, <span class='nv'>linear_spec</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; ══ Workflow [trained] ════════════════════════════════════════════════</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Preprocessor:</span> Recipe</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Model:</span> linear_reg()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Preprocessor ──────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; 1 Recipe Step</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; • step_pca()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Model ─────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Call:</span></span> <span><span class='c'>#&gt; stats::lm(formula = ..y ~ ., data = data)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Coefficients:</span></span> <span><span class='c'>#&gt; (Intercept) PC1 PC2 PC3 PC4 </span></span> <span><span class='c'>#&gt; 43.39293 0.07609 -0.05266 0.57892 0.94890 </span></span> <span><span class='c'>#&gt; PC5 </span></span> <span><span class='c'>#&gt; -1.72569</span></span> <span></span></code></pre> </div> <p>Works like a charm. That error we saw previously could be much more helpful, though. With the current developmental version of parsnip, this looks like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>fit</span><span class='o'>(</span><span class='nv'>linear_spec</span>, <span class='o'>~</span> <span class='nv'>hp</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `linear_reg()` was unable to find an outcome.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Ensure that you have specified an outcome column and that it hasn't</span></span> <span><span class='c'>#&gt; been removed in pre-processing.</span></span> <span></span></code></pre> </div> <p>Or, with workflows:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>mtcars_rec</span>, <span class='nv'>linear_spec</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `linear_reg()` was unable to find an outcome.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Ensure that you have specified an outcome column and that it hasn't</span></span> <span><span class='c'>#&gt; been removed in pre-processing.</span></span> <span></span></code></pre> </div> <p>Much better.</p> <h2 id="predicting-with-things-that-cant-predict">Predicting with things that can&rsquo;t predict <a href="#predicting-with-things-that-cant-predict"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Earlier this year, Dr. Louise E. Sinks put out a <a href="https://lsinks.github.io/posts/2023-04-10-tidymodels/tidymodels_tutorial.html" target="_blank" rel="noopener">wonderful blog post</a> documenting what it felt like to approach the various object types defined in the tidymodels as a newcomer to the collection of packages. They wrote:</p> <blockquote> <p>I found it confusing that <code>fit</code>, <code>last_fit</code>, <code>fit_resamples</code>, etc., did not all produce objects that contained the same information and could be acted on by the same functions.</p> </blockquote> <p>This makes sense. While we try to forefront the intended mental model for fitting and predicting with tidymodels in our APIs and documentation, we also need to be proactive in anticipating common challenges in constructing that mental model.</p> <p>For example, we&rsquo;ve found that it&rsquo;s sometimes not clear to users which outputs they can call <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> on. One such situation, as Louise points out, is with <code>fit_resamples()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># fit a linear regression model to bootstrap resamples of mtcars</span></span> <span><span class='nv'>mtcars_res</span> <span class='o'>&lt;-</span> <span class='nf'>fit_resamples</span><span class='o'>(</span><span class='nf'>linear_reg</span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>mtcars_res</span></span> <span><span class='c'>#&gt; # Resampling results</span></span> <span><span class='c'>#&gt; # Bootstrap sampling </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 25 × 4</span></span></span> <span><span class='c'>#&gt; splits id .metrics .notes </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap01 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #555555;'>&lt;split [32/10]&gt;</span> Bootstrap02 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #555555;'>&lt;split [32/16]&gt;</span> Bootstrap03 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap04 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #555555;'>&lt;split [32/10]&gt;</span> Bootstrap05 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #555555;'>&lt;split [32/13]&gt;</span> Bootstrap06 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #555555;'>&lt;split [32/16]&gt;</span> Bootstrap07 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap08 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap09 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #555555;'>&lt;split [32/10]&gt;</span> Bootstrap10 <span style='color: #555555;'>&lt;tibble [2 × 4]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 15 more rows</span></span></span> <span></span></code></pre> </div> <p>With previous tidymodels versions, mistakenly trying to predict with this object resulted in the following output:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">predict</span><span class="p">(</span><span class="n">mtcars_res</span><span class="p">)</span> <span class="c1">#&gt; Error in UseMethod(&#34;predict&#34;) : </span> <span class="c1">#&gt; no applicable method for &#39;predict&#39; applied to an object of class</span> <span class="c1">#&gt; &#34;c(&#39;resample_results&#39;, &#39;tune_results&#39;, &#39;tbl_df&#39;, &#39;tbl&#39;, &#39;data.frame&#39;)&#34;</span> </code></pre></div><p>Some R developers may recognize this error as what results when we didn&rsquo;t define any <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> method for <code>tune_results</code> objects. We didn&rsquo;t do so because prediction isn&rsquo;t well-defined for tuning results. <em>But</em>, this error message does little to help a user understand why that&rsquo;s the case.</p> <p>We&rsquo;ve recently made some changes to error more informatively in this case. We do so by defining a &ldquo;dummy&rdquo; <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> method for tuning results, implemented only for the sake of erroring more informatively. The same code will now give the following output:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">predict</span><span class="p">(</span><span class="n">mtcars_res</span><span class="p">)</span> <span class="c1">#&gt; Error in `predict()`:</span> <span class="c1">#&gt; ! `predict()` is not well-defined for tuning results.</span> <span class="c1">#&gt; ℹ To predict with the optimal model configuration from tuning</span> <span class="c1">#&gt; results, ensure that the tuning result was generated with the</span> <span class="c1">#&gt; control option `save_workflow = TRUE`, run `fit_best()`, and</span> <span class="c1">#&gt; then predict using `predict()` on its output.</span> <span class="c1">#&gt; ℹ To collect predictions from tuning results, ensure that the</span> <span class="c1">#&gt; tuning result was generated with the control option `save_pred</span> <span class="c1">#&gt; = TRUE` and run `collect_predictions()`.</span> </code></pre></div><p>References to important concepts or functions, like <a href="https://tune.tidymodels.org/reference/control_grid.html" target="_blank" rel="noopener">control options</a>, <a href="https://tune.tidymodels.org/reference/fit_best.html?q=fit_best" target="_blank" rel="noopener"><code>fit_best()</code></a>, and <a href="https://tune.tidymodels.org/reference/collect_predictions.html?q=collect" target="_blank" rel="noopener"><code>collect_predictions()</code></a>, link to the help-files for those functions using <a href="https://cli.r-lib.org/reference/cli_abort.html" target="_blank" rel="noopener">cli&rsquo;s erroring tools</a>.</p> <p>We hope new error messages like this will help to get folks back on track.</p> <h2 id="model-formulas">Model formulas <a href="#model-formulas"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In R, formulas provide a compact, symbolic notation to specify model terms. Many modeling functions in R make use of &ldquo;specials,&rdquo; or nonstandard notations used in formulas. Specials are defined and handled as a special case by a given modeling package. parsnip defers to engine packages to handle specials, so you can work with them as usual. For example, the mgcv package provides support for generalized additive models in R, and defines a special called <code>s()</code> to indicate smoothing terms. You can interface with it via tidymodels like so:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># define a generalized additive model specification</span></span> <span><span class='nv'>gam_spec</span> <span class='o'>&lt;-</span> <span class='nf'>gen_additive_mod</span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span> <span></span> <span><span class='c'># fit the specification using a formula with specials</span></span> <span><span class='nf'>fit</span><span class='o'>(</span><span class='nv'>gam_spec</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>cyl</span> <span class='o'>+</span> <span class='nf'>s</span><span class='o'>(</span><span class='nv'>disp</span>, k <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; parsnip model object</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Family: gaussian </span></span> <span><span class='c'>#&gt; Link function: identity </span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Formula:</span></span> <span><span class='c'>#&gt; mpg ~ cyl + s(disp, k = 5)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Estimated degrees of freedom:</span></span> <span><span class='c'>#&gt; 3.39 total = 5.39 </span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; GCV score: 6.380152</span></span> <span></span></code></pre> </div> <p>While parsnip can handle specials just fine, the package is often used in conjunction with the greater tidymodels package ecosystem, which defines its own pre-processing infrastructure and functionality via packages like hardhat and recipes. The specials defined in many modeling packages introduce conflicts with that infrastructure. To support specials while also maintaining consistent syntax elsewhere in the ecosystem, <strong>tidymodels delineates between two types of formulas: preprocessing formulas and model formulas</strong>. Preprocessing formulas determine the input variables, while model formulas determine the model structure.</p> <p>This is a tricky abstraction, and one that users have tripped up on in the past. Users could generate all sorts of different errors by 1) mistakenly passing model formulas where preprocessing formulas were expected, or 2) forgetting to pass a model formula where it&rsquo;s needed. For an example of 1), we could pass recipes the same formula we passed to parsnip:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">recipe</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">cyl</span> <span class="o">+</span> <span class="nf">s</span><span class="p">(</span><span class="n">disp</span><span class="p">,</span> <span class="n">k</span> <span class="o">=</span> <span class="m">5</span><span class="p">),</span> <span class="n">mtcars</span><span class="p">)</span> <span class="c1">#&gt; Error in `inline_check()`:</span> <span class="c1">#&gt; ! No in-line functions should be used here; use steps to </span> <span class="c1">#&gt; define baking actions.</span> </code></pre></div><p>But we <em>just</em> used a special with another tidymodels function! Rude!</p> <p>Or, to demonstrate 2), we pass the preprocessing formula as we ought to but forget to provide the model formula:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">gam_wflow</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">add_formula</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.)</span> <span class="o">%&gt;%</span> <span class="nf">add_model</span><span class="p">(</span><span class="n">gam_spec</span><span class="p">)</span> <span class="n">gam_wflow</span> <span class="o">%&gt;%</span> <span class="nf">fit</span><span class="p">(</span><span class="n">mtcars</span><span class="p">)</span> <span class="c1">#&gt; Error in `fit_xy()`:</span> <span class="c1">#&gt; ! `fit()` must be used with GAM models (due to its use of formulas).</span> </code></pre></div><p>Uh, but I <em>did</em> just use <code>fit()</code>!</p> <p>Since the distinction between model formulas and preprocessor formulas comes up in functions across tidymodels, we decide to create a <a href="https://parsnip.tidymodels.org/dev/reference/model_formula.html" target="_blank" rel="noopener">central page</a> that documents the concept itself, hopefully making the syntax associated with it come more easily to users. Then, we link to it <em>all over the place</em>. For example, those errors now look like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>cyl</span> <span class='o'>+</span> <span class='nf'>s</span><span class='o'>(</span><span class='nv'>disp</span>, k <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `inline_check()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> No in-line functions should be used here.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> The following function was found: `s`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use steps to do transformations instead.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> If your modeling engine uses special terms in formulas, pass that</span></span> <span><span class='c'>#&gt; formula to workflows as a model formula</span></span> <span><span class='c'>#&gt; (`?parsnip::model_formula()`).</span></span> <span></span></code></pre> </div> <p>Or:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>gam_wflow</span> <span class='o'>%&gt;%</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> When working with generalized additive models, please supply</span></span> <span><span class='c'>#&gt; the model specification to `workflows::add_model()` along with a</span></span> <span><span class='c'>#&gt; `formula` argument.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> See `?parsnip::model_formula()` to learn more.</span></span> <span></span></code></pre> </div> <p>While I&rsquo;ve only outlined three, there are all sorts of improvements to error messages on their way to the tidymodels packages in upcoming releases. If you happen to stumble across them, we hope they quickly set you back on the right path. 🗺</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>Issue triage consists of categorizing, prioritizing, and consolidating issues in a repository&rsquo;s issue tracker. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>See the <a href="https://tidyclust.tidymodels.org" target="_blank" rel="noopener">tidyclust</a> package for unsupervised learning with tidymodels! <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> dbplyr 2.4.0 https://www.tidyverse.org/blog/2023/10/dbplyr-2-4-0/ Thu, 26 Oct 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/10/dbplyr-2-4-0/ <!-- * also include something about dbplyr 2.3.1? * support for [`join_by()`](https://dplyr.tidyverse.org/reference/join_by.html) * many bugs introduced in 2.3.0 fixed TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="http://dbplyr.tidyverse.org/" target="_blank" rel="noopener">dbplyr</a> 2.4.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dbplyr"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will highlight some of the most important new features: eliminating subqueries when using multiple unions in a row, getting more control on the generated SQL, and a handful of new translations. As usual, release comes with a large number of improvements to translations for individual backends; see the full list in the <a href="https://github.com/tidyverse/dbplyr/releases/tag/v2.4.0" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbplyr.tidyverse.org/'>dbplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span></code></pre> </div> <h2 id="sql-optimisation">SQL optimisation <a href="#sql-optimisation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>dbplyr now produces fewer subqueries when combining tables with <a href="https://generics.r-lib.org/reference/setops.html" target="_blank" rel="noopener"><code>union()</code></a> and <a href="https://dplyr.tidyverse.org/reference/setops.html" target="_blank" rel="noopener"><code>union_all()</code></a> resulting in shorter, more readable, and, in some cases, faster SQL.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='s'>"a"</span>, .name <span class='o'>=</span> <span class='s'>"lf1"</span><span class='o'>)</span></span> <span><span class='nv'>lf2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='s'>"b"</span>, .name <span class='o'>=</span> <span class='s'>"lf2"</span><span class='o'>)</span></span> <span><span class='nv'>lf3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, z <span class='o'>=</span> <span class='s'>"c"</span>, .name <span class='o'>=</span> <span class='s'>"lf3"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>union</a></span><span class='o'>(</span><span class='nv'>lf2</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>union</a></span><span class='o'>(</span><span class='nv'>lf3</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf1`.*, NULL<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>UNION</span></span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf2`.*, NULL<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf2`</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>UNION</span></span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `x`, NULL<span style='color: #0000BB;'> AS </span>`y`, `z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf3`</span></span> <span></span></code></pre> </div> <p>(As usual in these blog posts, I&rsquo;m using <a href="https://dbplyr.tidyverse.org/reference/tbl_lazy.html" target="_blank" rel="noopener"><code>lazy_frame()</code></a> to focus on the SQL generation, without having to set up a dummy database.)</p> <p>Similarly, a <code>semi/anti_join()</code> on a filtered table now avoids a subquery:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter-joins.html'>semi_join</a></span><span class='o'>(</span><span class='nv'>lf3</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>z</span> <span class='o'>==</span> <span class='s'>"c"</span><span class='o'>)</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf1`.*</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; WHERE EXISTS (</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT 1 FROM</span> `lf3`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>WHERE</span> (`lf1`.`x` = `lf3`.`x`)<span style='color: #0000BB;'> AND</span> (`lf3`.`z` = 'c')</span></span> <span><span class='c'>#&gt; )</span></span> <span></span></code></pre> </div> <h2 id="sql-generation">SQL generation <a href="#sql-generation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The new argument <code>sql_options</code> for <a href="https://dplyr.tidyverse.org/reference/explain.html" target="_blank" rel="noopener"><code>show_query()</code></a> and <a href="https://dbplyr.tidyverse.org/reference/remote_name.html" target="_blank" rel="noopener"><code>remote_query()</code></a> gives you more control on the generated SQL.</p> <ul> <li> <p>By default dbplyr uses <code>*</code> to select all columns of a table, but with <code>use_star = FALSE</code> all columns are selected explicitly:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='m'>2</span>, z <span class='o'>=</span> <span class='m'>3</span>, .name <span class='o'>=</span> <span class='s'>"lf3"</span><span class='o'>)</span></span> <span><span class='nv'>lf3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf3`.*, 4.0<span style='color: #0000BB;'> AS </span>`a`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf3`</span></span> <span></span><span></span> <span><span class='nv'>lf3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span>sql_options <span class='o'>=</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/sql_options.html'>sql_options</a></span><span class='o'>(</span>use_star <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `x`, `y`, `z`, 4.0<span style='color: #0000BB;'> AS </span>`a`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf3`</span></span> <span></span></code></pre> </div> </li> <li> <p>If you prefer common table expressions (CTE) over subqueries use <code>cte = TRUE</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>nested_query</span> <span class='o'>&lt;-</span> <span class='nv'>lf3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>z <span class='o'>=</span> <span class='nv'>z</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>lf2</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>nested_query</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `LHS`.*</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> (</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `x`, `y`, `z` + 1.0<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf3`</span></span> <span><span class='c'>#&gt; )<span style='color: #0000BB;'> AS </span>`LHS`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf2`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`LHS`.`x` = `lf2`.`x`<span style='color: #0000BB;'> AND</span> `LHS`.`y` = `lf2`.`y`)</span></span> <span></span><span></span> <span><span class='nv'>nested_query</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span>sql_options <span class='o'>=</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/sql_options.html'>sql_options</a></span><span class='o'>(</span>cte <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>WITH</span> `q01` <span style='color: #0000BB;'>AS</span> (</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `x`, `y`, `z` + 1.0<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf3`</span></span> <span><span class='c'>#&gt; )</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `LHS`.*</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `q01`<span style='color: #0000BB;'> AS </span>`LHS`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf2`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`LHS`.`x` = `lf2`.`x`<span style='color: #0000BB;'> AND</span> `LHS`.`y` = `lf2`.`y`)</span></span> <span></span></code></pre> </div> </li> <li> <p>And if you want that all columns in a join are qualified with the table name and not only the ambiguous ones use <code>qualify_all_columns = TRUE</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>qualify_columns</span> <span class='o'>&lt;-</span> <span class='nv'>lf2</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>lf3</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>qualify_columns</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf2`.*, `z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf2`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf3`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`lf2`.`x` = `lf3`.`x`<span style='color: #0000BB;'> AND</span> `lf2`.`y` = `lf3`.`y`)</span></span> <span></span><span></span> <span><span class='nv'>qualify_columns</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span>sql_options <span class='o'>=</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/sql_options.html'>sql_options</a></span><span class='o'>(</span>qualify_all_columns <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf2`.*, `lf3`.`z`<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf2`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf3`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`lf2`.`x` = `lf3`.`x`<span style='color: #0000BB;'> AND</span> `lf2`.`y` = `lf3`.`y`)</span></span> <span></span></code></pre> </div> </li> </ul> <h2 id="new-translations">New translations <a href="#new-translations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><code>str_detect()</code>, <code>str_starts()</code> and <code>str_ends()</code> with fixed patterns are translated to <code>INSTR()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span></span> <span> <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/modifiers.html'>fixed</a></span><span class='o'>(</span><span class='s'>"abc"</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_starts.html'>str_starts</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/modifiers.html'>fixed</a></span><span class='o'>(</span><span class='s'>"a"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf1`.*</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>WHERE</span> (INSTR(`x`, 'abc') &gt; 0)<span style='color: #0000BB;'> AND</span> (INSTR(`x`, 'a') = 1)</span></span> <span></span></code></pre> </div> <p>And <a href="https://rdrr.io/r/base/nchar.html" target="_blank" rel="noopener"><code>nzchar()</code></a> and <a href="https://rdrr.io/r/stats/Uniform.html" target="_blank" rel="noopener"><code>runif()</code></a> are now translated to their SQL equivalents:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/nchar.html'>nzchar</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>z <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `lf1`.*, RANDOM()<span style='color: #0000BB;'> AS </span>`z`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>WHERE</span> (((`x` IS NULL) OR `x` != ''))</span></span> <span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The vast majority of this release (particularly the SQL optimisations) are from <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a>; thanks so much for continued work on this package! And a big thanks go to the 84 other folks who helped out by filing issues and contributing code: <a href="https://github.com/abalter" target="_blank" rel="noopener">@abalter</a>, <a href="https://github.com/ablack3" target="_blank" rel="noopener">@ablack3</a>, <a href="https://github.com/andreassoteriadesmoj" target="_blank" rel="noopener">@andreassoteriadesmoj</a>, <a href="https://github.com/apalacio9502" target="_blank" rel="noopener">@apalacio9502</a>, <a href="https://github.com/avsdev-cw" target="_blank" rel="noopener">@avsdev-cw</a>, <a href="https://github.com/bairdj" target="_blank" rel="noopener">@bairdj</a>, <a href="https://github.com/bastistician" target="_blank" rel="noopener">@bastistician</a>, <a href="https://github.com/brownj31" target="_blank" rel="noopener">@brownj31</a>, <a href="https://github.com/But2ene" target="_blank" rel="noopener">@But2ene</a>, <a href="https://github.com/carlganz" target="_blank" rel="noopener">@carlganz</a>, <a href="https://github.com/catalamarti" target="_blank" rel="noopener">@catalamarti</a>, <a href="https://github.com/CEH-SLU" target="_blank" rel="noopener">@CEH-SLU</a>, <a href="https://github.com/chriscardillo" target="_blank" rel="noopener">@chriscardillo</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/DaZaM82" target="_blank" rel="noopener">@DaZaM82</a>, <a href="https://github.com/donour" target="_blank" rel="noopener">@donour</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/eduardszoecs" target="_blank" rel="noopener">@eduardszoecs</a>, <a href="https://github.com/eipi10" target="_blank" rel="noopener">@eipi10</a>, <a href="https://github.com/ejneer" target="_blank" rel="noopener">@ejneer</a>, <a href="https://github.com/erikvona" target="_blank" rel="noopener">@erikvona</a>, <a href="https://github.com/fh-afrachioni" target="_blank" rel="noopener">@fh-afrachioni</a>, <a href="https://github.com/fh-mthomson" target="_blank" rel="noopener">@fh-mthomson</a>, <a href="https://github.com/gui-salome" target="_blank" rel="noopener">@gui-salome</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/halpo" target="_blank" rel="noopener">@halpo</a>, <a href="https://github.com/homer3018" target="_blank" rel="noopener">@homer3018</a>, <a href="https://github.com/iangow" target="_blank" rel="noopener">@iangow</a>, <a href="https://github.com/jdlom" target="_blank" rel="noopener">@jdlom</a>, <a href="https://github.com/jennal-datacenter" target="_blank" rel="noopener">@jennal-datacenter</a>, <a href="https://github.com/JeremyPasco" target="_blank" rel="noopener">@JeremyPasco</a>, <a href="https://github.com/jiemakel" target="_blank" rel="noopener">@jiemakel</a>, <a href="https://github.com/jingydz" target="_blank" rel="noopener">@jingydz</a>, <a href="https://github.com/johnbaums" target="_blank" rel="noopener">@johnbaums</a>, <a href="https://github.com/joshseiv" target="_blank" rel="noopener">@joshseiv</a>, <a href="https://github.com/jrandall" target="_blank" rel="noopener">@jrandall</a>, <a href="https://github.com/khkk378" target="_blank" rel="noopener">@khkk378</a>, <a href="https://github.com/kmishra9" target="_blank" rel="noopener">@kmishra9</a>, <a href="https://github.com/kongdd" target="_blank" rel="noopener">@kongdd</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/krprasangdas" target="_blank" rel="noopener">@krprasangdas</a>, <a href="https://github.com/KRRLP-PL" target="_blank" rel="noopener">@KRRLP-PL</a>, <a href="https://github.com/lentinj" target="_blank" rel="noopener">@lentinj</a>, <a href="https://github.com/lgaborini" target="_blank" rel="noopener">@lgaborini</a>, <a href="https://github.com/lhabegger" target="_blank" rel="noopener">@lhabegger</a>, <a href="https://github.com/lorenzolightsgdwarf" target="_blank" rel="noopener">@lorenzolightsgdwarf</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, <a href="https://github.com/marianschmidt" target="_blank" rel="noopener">@marianschmidt</a>, <a href="https://github.com/matthewjnield" target="_blank" rel="noopener">@matthewjnield</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/misea" target="_blank" rel="noopener">@misea</a>, <a href="https://github.com/mjbroerman" target="_blank" rel="noopener">@mjbroerman</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/nannerhammix" target="_blank" rel="noopener">@nannerhammix</a>, <a href="https://github.com/nikolasharing" target="_blank" rel="noopener">@nikolasharing</a>, <a href="https://github.com/nviets" target="_blank" rel="noopener">@nviets</a>, <a href="https://github.com/nviraj" target="_blank" rel="noopener">@nviraj</a>, <a href="https://github.com/oobd" target="_blank" rel="noopener">@oobd</a>, <a href="https://github.com/pboesu" target="_blank" rel="noopener">@pboesu</a>, <a href="https://github.com/pepijn-devries" target="_blank" rel="noopener">@pepijn-devries</a>, <a href="https://github.com/rbcavanaugh" target="_blank" rel="noopener">@rbcavanaugh</a>, <a href="https://github.com/rcepka" target="_blank" rel="noopener">@rcepka</a>, <a href="https://github.com/robertkck" target="_blank" rel="noopener">@robertkck</a>, <a href="https://github.com/samssann" target="_blank" rel="noopener">@samssann</a>, <a href="https://github.com/SayfSaid" target="_blank" rel="noopener">@SayfSaid</a>, <a href="https://github.com/scottporter" target="_blank" rel="noopener">@scottporter</a>, <a href="https://github.com/shearerpmm" target="_blank" rel="noopener">@shearerpmm</a>, <a href="https://github.com/srikanthtist" target="_blank" rel="noopener">@srikanthtist</a>, <a href="https://github.com/stemangiola" target="_blank" rel="noopener">@stemangiola</a>, <a href="https://github.com/stephenashton-dhsc" target="_blank" rel="noopener">@stephenashton-dhsc</a>, <a href="https://github.com/stevepowell99" target="_blank" rel="noopener">@stevepowell99</a>, <a href="https://github.com/TBlackmore" target="_blank" rel="noopener">@TBlackmore</a>, <a href="https://github.com/thomashulst" target="_blank" rel="noopener">@thomashulst</a>, <a href="https://github.com/thothal" target="_blank" rel="noopener">@thothal</a>, <a href="https://github.com/tilo-aok" target="_blank" rel="noopener">@tilo-aok</a>, <a href="https://github.com/tisseuil" target="_blank" rel="noopener">@tisseuil</a>, <a href="https://github.com/tonyk7440" target="_blank" rel="noopener">@tonyk7440</a>, <a href="https://github.com/TSchiefer" target="_blank" rel="noopener">@TSchiefer</a>, <a href="https://github.com/Tsemharb" target="_blank" rel="noopener">@Tsemharb</a>, <a href="https://github.com/tuge98" target="_blank" rel="noopener">@tuge98</a>, <a href="https://github.com/vadim-cherepanov" target="_blank" rel="noopener">@vadim-cherepanov</a>, and <a href="https://github.com/wdenton" target="_blank" rel="noopener">@wdenton</a>.</p> testthat 3.2.0 https://www.tidyverse.org/blog/2023/10/testthat-3-2-0/ Sun, 08 Oct 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/10/testthat-3-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="http://testthat.r-lib.org/" target="_blank" rel="noopener">testthat</a> 3.2.0. testthat makes it easy to turn your existing informal tests into formal, automated tests that you can rerun quickly and easily. testthat is the most popular unit-testing package for R, and is used by almost 9,000 CRAN and Bioconductor packages. You can learn more about unit testing at <a href="https://r-pkgs.org/tests.html">https://r-pkgs.org/tests.html</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"testthat"</span><span class='o'>)</span></span></code></pre> </div> <p>testthat 3.2.0 includes relatively few new features but there have been nine patch releases since testthat 3.1.0. These patch releases contained a bunch of experiments that we now believe are ready for the world. So this blog post summarises the changes in <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.1" target="_blank" rel="noopener">3.1.1</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.2" target="_blank" rel="noopener">3.1.2</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.3" target="_blank" rel="noopener">3.1.3</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.4" target="_blank" rel="noopener">3.1.4</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.5" target="_blank" rel="noopener">3.1.5</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.6" target="_blank" rel="noopener">3.1.6</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.7" target="_blank" rel="noopener">3.1.7</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.8" target="_blank" rel="noopener">3.1.8</a>, <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.9" target="_blank" rel="noopener">3.1.9</a>, and <a href="https://github.com/r-lib/testthat/releases/tag/v3.1.10" target="_blank" rel="noopener">3.1.10</a> over the last two years.</p> <p>Here we&rsquo;ll focus on the biggest news: new expectations, tweaks to the way that error snapshots are reported, support for mocking, a new way to detect if a test has changed global state, and a bunch of smaller UI improvements.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://testthat.r-lib.org'>testthat</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="documentation">Documentation <a href="#documentation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first and most important thing to point out is that the second edition of <a href="https://r-pkgs.org" target="_blank" rel="noopener">R Packages</a> contains updated and much expanded coverage of testing. Coverage of testing is now split up over three chapters:</p> <ul> <li> <a href="https://r-pkgs.org/testing-basics.html" target="_blank" rel="noopener">Testing basics</a></li> <li> <a href="https://r-pkgs.org/testing-design.html" target="_blank" rel="noopener">Designing your test suite</a></li> <li> <a href="https://r-pkgs.org/testing-advanced.html" target="_blank" rel="noopener">Advanced testing techniques</a></li> </ul> <p>There&rsquo;s also a new vignette about special files ( <a href="https://testthat.r-lib.org/articles/special-files.html" target="_blank" rel="noopener"><code>vignette(&quot;special-files&quot;)</code></a>) which describes the various special files that you find in <code>tests/testthat</code> and when you might need to use them.</p> <h2 id="new-expectations">New expectations <a href="#new-expectations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are a handful of notable new expectations. <a href="https://testthat.r-lib.org/reference/expect_setequal.html" target="_blank" rel="noopener"><code>expect_contains()</code></a> and <a href="https://testthat.r-lib.org/reference/expect_setequal.html" target="_blank" rel="noopener"><code>expect_in()</code></a> work similarly to <code>expect_true(all(expected %in% object))</code> or <code>expect_true(all(object %in% expected))</code> but give more informative failure messages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>fruits</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"apple"</span>, <span class='s'>"banana"</span>, <span class='s'>"pear"</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_setequal.html'>expect_contains</a></span><span class='o'>(</span><span class='nv'>fruits</span>, <span class='s'>"apple"</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_setequal.html'>expect_contains</a></span><span class='o'>(</span><span class='nv'>fruits</span>, <span class='s'>"pineapple"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: `fruits` (`actual`) doesn't fully contain all the values in "pineapple" (`expected`).</span></span> <span><span class='c'>#&gt; * Missing from `actual`: "pineapple"</span></span> <span><span class='c'>#&gt; * Present in `actual`: "apple", "banana", "pear"</span></span> <span></span><span></span> <span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>TRUE</span>, <span class='kc'>FALSE</span>, <span class='kc'>TRUE</span>, <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_setequal.html'>expect_in</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>TRUE</span>, <span class='kc'>FALSE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>TRUE</span>, <span class='kc'>FALSE</span>, <span class='kc'>TRUE</span>, <span class='kc'>NA</span>, <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_setequal.html'>expect_in</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>TRUE</span>, <span class='kc'>FALSE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: `x` (`actual`) isn't fully contained within c(TRUE, FALSE) (`expected`).</span></span> <span><span class='c'>#&gt; * Missing from `expected`: NA</span></span> <span><span class='c'>#&gt; * Present in `expected`: TRUE, FALSE</span></span> <span></span></code></pre> </div> <p> <a href="https://testthat.r-lib.org/reference/expect_no_error.html" target="_blank" rel="noopener"><code>expect_no_error()</code></a>, <a href="https://testthat.r-lib.org/reference/expect_no_error.html" target="_blank" rel="noopener"><code>expect_no_warning()</code></a>, and <a href="https://testthat.r-lib.org/reference/expect_no_error.html" target="_blank" rel="noopener"><code>expect_no_message()</code></a> make it easier (and clearer) to confirm that code runs without errors, warnings, or messages. The default fails if there is any error/warning/message, but you can optionally supply either the <code>message</code> or <code>class</code> arguments to confirm the absence of a specific error/warning/message.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>foo</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='kr'>if</span> <span class='o'>(</span><span class='nv'>x</span> <span class='o'>&lt;</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>x</span> <span class='o'>+</span> <span class='s'>"10"</span></span> <span> <span class='o'>&#125;</span> <span class='kr'>else</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>x</span> <span class='o'>=</span> <span class='m'>20</span></span> <span> <span class='o'>&#125;</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_no_error.html'>expect_no_error</a></span><span class='o'>(</span><span class='nf'>foo</span><span class='o'>(</span><span class='o'>-</span><span class='m'>10</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: Expected `foo(-10)` to run without any errors.</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> Actually got a &lt;simpleError&gt; with text:</span></span> <span><span class='c'>#&gt; non-numeric argument to binary operator</span></span> <span></span><span></span> <span><span class='c'># No difference here but will lead to a better failure later</span></span> <span><span class='c'># once you've fixed this problem and later introduce a new one</span></span> <span><span class='nf'><a href='https://testthat.r-lib.org/reference/expect_no_error.html'>expect_no_error</a></span><span class='o'>(</span><span class='nf'>foo</span><span class='o'>(</span><span class='o'>-</span><span class='m'>10</span><span class='o'>)</span>, message <span class='o'>=</span> <span class='s'>"non-numeric argument"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: Expected `foo(-10)` to run without any errors matching pattern 'non-numeric argument'.</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> Actually got a &lt;simpleError&gt; with text:</span></span> <span><span class='c'>#&gt; non-numeric argument to binary operator</span></span> <span></span></code></pre> </div> <h2 id="snapshotting-changes">Snapshotting changes <a href="#snapshotting-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><code>expect_snapshot(error = TRUE)</code> has a new display of error messages that strives to be closer to what you see interactively. In particular, you&rsquo;ll no longer see the error class and you will now see the error call.</p> <ul> <li> <p>Old display:</p> <pre><code>Code f() Error &lt;simpleError&gt; baz </code></pre> </li> <li> <p>New display:</p> <pre><code>Code f() Condition Error in `f()`: ! baz </code></pre> </li> </ul> <p>If you have used <code>expect_snapshot(error = TRUE)</code> in your package, this means that you will need to re-run and approve your snapshots. We hope this is not too annoying and we believe it is worth it given the more accurate reflection of generated error messages. This will not affect checks on CRAN because, by default, snapshot tests are not run on CRAN.</p> <h2 id="mocking">Mocking <a href="#mocking"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Mocking<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> is a tool for temporarily replacing the implementation of a function in order to make testing easier. Sometimes when testing a function, one part of it is challenging to run in your test environment (maybe it requires human interaction, a live database connection, or maybe it just takes a long time to run). For example, take the following imaginary function. It has a bunch of straightforward computation that would be easy to test but right in the middle of the function it calls <code>complicated()</code> which is hard to test:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>my_function</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, <span class='nv'>z</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>a</span> <span class='o'>&lt;-</span> <span class='nf'>f</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span></span> <span> <span class='nv'>b</span> <span class='o'>&lt;-</span> <span class='nf'>g</span><span class='o'>(</span><span class='nv'>y</span>, <span class='nv'>z</span><span class='o'>)</span></span> <span> <span class='nv'>c</span> <span class='o'>&lt;-</span> <span class='nf'>h</span><span class='o'>(</span><span class='nv'>a</span>, <span class='nv'>b</span><span class='o'>)</span></span> <span> </span> <span> <span class='nv'>d</span> <span class='o'>&lt;-</span> <span class='nf'>complicated</span><span class='o'>(</span><span class='nv'>c</span><span class='o'>)</span></span> <span> </span> <span> <span class='nf'>i</span><span class='o'>(</span><span class='nv'>d</span>, <span class='m'>1</span>, <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <p>Mocking allows you to temporarily replace <code>complicated()</code> with something simpler, allowing you to test the rest of the function. testthat now supports mocking with <a href="https://testthat.r-lib.org/reference/local_mocked_bindings.html" target="_blank" rel="noopener"><code>local_mocked_bindings()</code></a>, which temporarily replaces the implementation of a function. For example, to test <code>my_function()</code> you might write something like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://testthat.r-lib.org/reference/test_that.html'>test_that</a></span><span class='o'>(</span><span class='s'>"my_function() returns expected result"</span>, <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://testthat.r-lib.org/reference/local_mocked_bindings.html'>local_mocked_bindings</a></span><span class='o'>(</span></span> <span> complicated <span class='o'>=</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='kc'>TRUE</span></span> <span> <span class='o'>)</span></span> <span> <span class='nv'>...</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>testthat has a complicated past with mocking. testthat introduced <a href="https://testthat.r-lib.org/reference/with_mock.html" target="_blank" rel="noopener"><code>with_mock()</code></a> in v0.9 (way back in 2014), but we started discovering problems with the implementation in v2.0.0 (2017) leading to its deprecation in v3.0.0 (2020). A few packages arose to fill the gap (like <a href="https://github.com/r-lib/mockery" target="_blank" rel="noopener">mockery</a>, <a href="https://krlmlr.github.io/mockr/" target="_blank" rel="noopener">mockr</a>, and <a href="https://nbenn.github.io/mockthat/" target="_blank" rel="noopener">mockthat</a>) but none of their implementations were completely satisfactory. Earlier this year a new approach occurred to me that avoids many of the problems of the previous approaches. This is now implemented in <a href="https://testthat.r-lib.org/reference/local_mocked_bindings.html" target="_blank" rel="noopener"><code>with_mocked_bindings()</code></a> and <a href="https://testthat.r-lib.org/reference/local_mocked_bindings.html" target="_blank" rel="noopener"><code>local_mocked_bindings()</code></a>; we&rsquo;ve been using these new functions for a few months now without problems, and it feels like time to announce to the world.</p> <h2 id="state-inspector">State inspector <a href="#state-inspector"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In times gone by it was very easy to accidentally change the state of the world in a test:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://testthat.r-lib.org/reference/test_that.html'>test_that</a></span><span class='o'>(</span><span class='s'>"side-by-side diffs work"</span>, <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/options.html'>options</a></span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>20</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://testthat.r-lib.org/reference/expect_snapshot.html'>expect_snapshot</a></span><span class='o'>(</span></span> <span> <span class='nf'>waldo</span><span class='nf'>::</span><span class='nf'><a href='https://waldo.r-lib.org/reference/compare.html'>compare</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"X"</span>, <span class='nv'>letters</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='s'>"X"</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>When you look at a single test it&rsquo;s easy to spot the problem, and switch to a more appropriate way of temporarily changing the options, like <a href="https://withr.r-lib.org/reference/with_options.html" target="_blank" rel="noopener"><code>withr::local_options()</code></a>. But sometimes this mistake crept in a long time ago and is now hiding amongst hundreds or thousands of tests.</p> <p>In earlier versions of testthat, finding tests that accidentally changed the world was painful: the only way was to painstakingly review each test. Now you can use <a href="https://testthat.r-lib.org/reference/set_state_inspector.html" target="_blank" rel="noopener"><code>set_state_inspector()</code></a> to register a function that&rsquo;s called before and after every test. If the function returns different values, testthat will let you know. You&rsquo;ll typically do this either in <code>tests/testhat/setup.R</code> or an existing helper file.</p> <p>So, for example, to detect if any of your tests have modified options you could use this state inspector:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://testthat.r-lib.org/reference/set_state_inspector.html'>set_state_inspector</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>options <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/options.html'>options</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>Or maybe you&rsquo;ve seen an <code>R CMD check</code> warning that you&rsquo;ve forgotten to close a connection:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://testthat.r-lib.org/reference/set_state_inspector.html'>set_state_inspector</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>connections <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/showConnections.html'>showConnections</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span><span class='o'>)</span></span></code></pre> </div> <p>And you can of course combine multiple checks just by returning a more complicated list.</p> <h2 id="ui-improvements">UI improvements <a href="#ui-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>testthat 3.2.0 includes a bunch of minor user interface improvements that should make day-to-day use of testthat more enjoyable. Some of our favourite highlights are:</p> <ul> <li>Parallel testing now works much better with snapshot tests. (And updates to the processx package means that testthat no longer leaves processes around if you terminate a test process early.)</li> <li>We use an improved algorithm to find the source reference associated with an expectation/error/warning/skip. We now look for the most recent call (within inside <a href="https://testthat.r-lib.org/reference/test_that.html" target="_blank" rel="noopener"><code>test_that()</code></a> that has known source. This generally gives more specific locations than the previous approach and gives much better locations if an error occurs in an exit handler.</li> <li>Tracebacks are no longer truncated and we use rlang&rsquo;s default tree display; this should make it easier to track down problems when testing in non-interactive contexts.</li> <li>Assuming you have a recent RStudio, test failures are now clickable, taking you to the line where the problem occurred. Similarly, when a snapshot test changes, you can now click that suggested code to run the appropriate <a href="https://testthat.r-lib.org/reference/snapshot_accept.html" target="_blank" rel="noopener"><code>snapshot_accept()</code></a> call.</li> <li>Skips are now only shown at the end of reporter summaries, not as tests are run. This makes them less intrusive in interactive tests while still allowing you to verify that the correct tests are skipped.</li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 127 contributors who helped make these last 10 release of testthat happen, whether it be through contributed code or filing issues: <a href="https://github.com/ALanguillaume" target="_blank" rel="noopener">@ALanguillaume</a>, <a href="https://github.com/alessandroaccettulli" target="_blank" rel="noopener">@alessandroaccettulli</a>, <a href="https://github.com/ambica-aas" target="_blank" rel="noopener">@ambica-aas</a>, <a href="https://github.com/annweideman" target="_blank" rel="noopener">@annweideman</a>, <a href="https://github.com/aronatkins" target="_blank" rel="noopener">@aronatkins</a>, <a href="https://github.com/ashander" target="_blank" rel="noopener">@ashander</a>, <a href="https://github.com/AshesITR" target="_blank" rel="noopener">@AshesITR</a>, <a href="https://github.com/astayleraz" target="_blank" rel="noopener">@astayleraz</a>, <a href="https://github.com/ateucher" target="_blank" rel="noopener">@ateucher</a>, <a href="https://github.com/avraam-inside" target="_blank" rel="noopener">@avraam-inside</a>, <a href="https://github.com/b-steve" target="_blank" rel="noopener">@b-steve</a>, <a href="https://github.com/bersbersbers" target="_blank" rel="noopener">@bersbersbers</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/Bisaloo" target="_blank" rel="noopener">@Bisaloo</a>, <a href="https://github.com/cboettig" target="_blank" rel="noopener">@cboettig</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/chendaniely" target="_blank" rel="noopener">@chendaniely</a>, <a href="https://github.com/ChrisBeeley" target="_blank" rel="noopener">@ChrisBeeley</a>, <a href="https://github.com/ColinFay" target="_blank" rel="noopener">@ColinFay</a>, <a href="https://github.com/CorradoLanera" target="_blank" rel="noopener">@CorradoLanera</a>, <a href="https://github.com/daattali" target="_blank" rel="noopener">@daattali</a>, <a href="https://github.com/damianooldoni" target="_blank" rel="noopener">@damianooldoni</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/danielinteractive" target="_blank" rel="noopener">@danielinteractive</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/daynefiler" target="_blank" rel="noopener">@daynefiler</a>, <a href="https://github.com/dbdimitrov" target="_blank" rel="noopener">@dbdimitrov</a>, <a href="https://github.com/dcaseykc" target="_blank" rel="noopener">@dcaseykc</a>, <a href="https://github.com/dgkf" target="_blank" rel="noopener">@dgkf</a>, <a href="https://github.com/dhicks" target="_blank" rel="noopener">@dhicks</a>, <a href="https://github.com/dimfalk" target="_blank" rel="noopener">@dimfalk</a>, <a href="https://github.com/dougwyu" target="_blank" rel="noopener">@dougwyu</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/dvg-p4" target="_blank" rel="noopener">@dvg-p4</a>, <a href="https://github.com/elong0527" target="_blank" rel="noopener">@elong0527</a>, <a href="https://github.com/Enchufa2" target="_blank" rel="noopener">@Enchufa2</a>, <a href="https://github.com/etiennebacher" target="_blank" rel="noopener">@etiennebacher</a>, <a href="https://github.com/FlippieCoetser" target="_blank" rel="noopener">@FlippieCoetser</a>, <a href="https://github.com/florisvdh" target="_blank" rel="noopener">@florisvdh</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gareth-j" target="_blank" rel="noopener">@gareth-j</a>, <a href="https://github.com/gavinsimpson" target="_blank" rel="noopener">@gavinsimpson</a>, <a href="https://github.com/ghill-fusion" target="_blank" rel="noopener">@ghill-fusion</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/heavywatal" target="_blank" rel="noopener">@heavywatal</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/hhau" target="_blank" rel="noopener">@hhau</a>, <a href="https://github.com/hpages" target="_blank" rel="noopener">@hpages</a>, <a href="https://github.com/hsloot" target="_blank" rel="noopener">@hsloot</a>, <a href="https://github.com/hughjonesd" target="_blank" rel="noopener">@hughjonesd</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/jamieRowen" target="_blank" rel="noopener">@jamieRowen</a>, <a href="https://github.com/jayruffell" target="_blank" rel="noopener">@jayruffell</a>, <a href="https://github.com/JBGruber" target="_blank" rel="noopener">@JBGruber</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/JohnCoene" target="_blank" rel="noopener">@JohnCoene</a>, <a href="https://github.com/jonathanvoelkle" target="_blank" rel="noopener">@jonathanvoelkle</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/josherrickson" target="_blank" rel="noopener">@josherrickson</a>, <a href="https://github.com/kalaschnik" target="_blank" rel="noopener">@kalaschnik</a>, <a href="https://github.com/kapsner" target="_blank" rel="noopener">@kapsner</a>, <a href="https://github.com/kevinushey" target="_blank" rel="noopener">@kevinushey</a>, <a href="https://github.com/kjytay" target="_blank" rel="noopener">@kjytay</a>, <a href="https://github.com/krivit" target="_blank" rel="noopener">@krivit</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/larmarange" target="_blank" rel="noopener">@larmarange</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/llrs" target="_blank" rel="noopener">@llrs</a>, <a href="https://github.com/luma-sb" target="_blank" rel="noopener">@luma-sb</a>, <a href="https://github.com/machow" target="_blank" rel="noopener">@machow</a>, <a href="https://github.com/maciekbanas" target="_blank" rel="noopener">@maciekbanas</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/majr-red" target="_blank" rel="noopener">@majr-red</a>, <a href="https://github.com/maksymiuks" target="_blank" rel="noopener">@maksymiuks</a>, <a href="https://github.com/mardam" target="_blank" rel="noopener">@mardam</a>, <a href="https://github.com/MarkMc1089" target="_blank" rel="noopener">@MarkMc1089</a>, <a href="https://github.com/markschat" target="_blank" rel="noopener">@markschat</a>, <a href="https://github.com/MatthieuStigler" target="_blank" rel="noopener">@MatthieuStigler</a>, <a href="https://github.com/maurolepore" target="_blank" rel="noopener">@maurolepore</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/mbojan" target="_blank" rel="noopener">@mbojan</a>, <a href="https://github.com/mcol" target="_blank" rel="noopener">@mcol</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mkb13" target="_blank" rel="noopener">@mkb13</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/MKyhos" target="_blank" rel="noopener">@MKyhos</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/Mosk915" target="_blank" rel="noopener">@Mosk915</a>, <a href="https://github.com/mpjashby" target="_blank" rel="noopener">@mpjashby</a>, <a href="https://github.com/ms609" target="_blank" rel="noopener">@ms609</a>, <a href="https://github.com/mtmorgan" target="_blank" rel="noopener">@mtmorgan</a>, <a href="https://github.com/musvaage" target="_blank" rel="noopener">@musvaage</a>, <a href="https://github.com/nealrichardson" target="_blank" rel="noopener">@nealrichardson</a>, <a href="https://github.com/netique" target="_blank" rel="noopener">@netique</a>, <a href="https://github.com/njtierney" target="_blank" rel="noopener">@njtierney</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/osorensen" target="_blank" rel="noopener">@osorensen</a>, <a href="https://github.com/pbulsink" target="_blank" rel="noopener">@pbulsink</a>, <a href="https://github.com/peterdesmet" target="_blank" rel="noopener">@peterdesmet</a>, <a href="https://github.com/r2evans" target="_blank" rel="noopener">@r2evans</a>, <a href="https://github.com/radbasa" target="_blank" rel="noopener">@radbasa</a>, <a href="https://github.com/remlapmot" target="_blank" rel="noopener">@remlapmot</a>, <a href="https://github.com/rfineman" target="_blank" rel="noopener">@rfineman</a>, <a href="https://github.com/rgayler" target="_blank" rel="noopener">@rgayler</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/s-fleck" target="_blank" rel="noopener">@s-fleck</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/schloerke" target="_blank" rel="noopener">@schloerke</a>, <a href="https://github.com/sorhawell" target="_blank" rel="noopener">@sorhawell</a>, <a href="https://github.com/StatisMike" target="_blank" rel="noopener">@StatisMike</a>, <a href="https://github.com/StatsMan53" target="_blank" rel="noopener">@StatsMan53</a>, <a href="https://github.com/stela2502" target="_blank" rel="noopener">@stela2502</a>, <a href="https://github.com/stla" target="_blank" rel="noopener">@stla</a>, <a href="https://github.com/t-kalinowski" target="_blank" rel="noopener">@t-kalinowski</a>, <a href="https://github.com/tansaku" target="_blank" rel="noopener">@tansaku</a>, <a href="https://github.com/tomliptrot" target="_blank" rel="noopener">@tomliptrot</a>, <a href="https://github.com/torres-pedro" target="_blank" rel="noopener">@torres-pedro</a>, <a href="https://github.com/wes-brooks" target="_blank" rel="noopener">@wes-brooks</a>, <a href="https://github.com/wfmueller29" target="_blank" rel="noopener">@wfmueller29</a>, <a href="https://github.com/wleoncio" target="_blank" rel="noopener">@wleoncio</a>, <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>, <a href="https://github.com/yogat3ch" target="_blank" rel="noopener">@yogat3ch</a>, <a href="https://github.com/yuliaUU" target="_blank" rel="noopener">@yuliaUU</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zsigmas" target="_blank" rel="noopener">@zsigmas</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>Think mimicking, like a mockingbird, not making fun of. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Q3 2023 tidymodels digest https://www.tidyverse.org/blog/2023/10/tidymodels-2023-q3/ Thu, 05 Oct 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/10/tidymodels-2023-q3/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like this post from the past couple of months:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2023/08/validation-split-as-3-way-split/" target="_blank" rel="noopener">New interface to validation splits</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 11 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.3)</a></li> <li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.2)</a></li> <li>modeldata <a href="https://modeldata.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.1)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.8)</a></li> <li>rsample <a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>textrecipes <a href="https://textrecipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.4)</a></li> <li>themis <a href="https://themis.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>tidyclust <a href="https://tidyclust.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.0)</a></li> <li>tidymodels <a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.1)</a></li> <li>tune <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.2)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: Updated workshop material, new K-means engines and quality of life improvements in rsample. First, loading the collection of packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/tidyclust'>tidyclust</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"ames"</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="workshops">Workshops <a href="#workshops"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One of the biggest areas of work for our team this quarter was getting ready for this year&rsquo;s <a href="https://posit.co/conference/" target="_blank" rel="noopener">posit::conf</a>. This year, two 1-day workshops were available: &ldquo;Introduction to tidymodels&rdquo; and &ldquo;Advanced tidymodels&rdquo;. All the material can be found on our workshop website <a href="https://workshops.tidymodels.org/" target="_blank" rel="noopener">workshops.tidymodels.org</a>, with these workshops being archived as <a href="https://workshops.tidymodels.org/archive/2023-09-posit-conf/" target="_blank" rel="noopener">posit::conf 2023 workshops</a>.</p> <p>Unless otherwise noted (i.e. not an original creation and reused from another source), these educational materials are licensed under Creative Commons Attribution <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener">CC BY-SA 4.0</a>.</p> <h2 id="tidyclust-update">Tidyclust update <a href="#tidyclust-update"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The latest release of tidyclust featured a round of bug fixes, documentation improvements and quality-of-life improvements.</p> <p>This release adds 2 new engines to the <a href="https://tidyclust.tidymodels.org/reference/k_means.html" target="_blank" rel="noopener"><code>k_means()</code></a> model. <a href="https://tidyclust.tidymodels.org/reference/details_k_means_klaR.html" target="_blank" rel="noopener">klaR</a> to run K-Modes models and <a href="https://tidyclust.tidymodels.org/reference/details_k_means_clustMixType.html" target="_blank" rel="noopener">clustMixType</a> to run K-prototypes. K-Modes is the categorical analog to K-means, meaning that it is intended to be used on only categorical data, and K-prototypes is the more general method that works with categorical and numeric data at the same time.</p> <p>If we were to fit a K-means model to a mixed-type data set such as <code>ames</code>, it would work, but under the hood, the model would apply a dummy transformation on the categorical predictors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tidyclust.tidymodels.org/reference/k_means.html'>k_means</a></span><span class='o'>(</span>num_clusters <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"stats"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>kmeans_fit</span> <span class='o'>&lt;-</span> <span class='nv'>kmeans_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span></span></code></pre> </div> <p>When extracting the cluster means, we see that the dummy variables were used when calculating the means, which can make it harder to interpret the output.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_fit</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://tidyclust.tidymodels.org/reference/extract_centroids.html'>extract_centroids</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>select</span><span class='o'>(</span><span class='m'>101</span><span class='o'>:</span><span class='m'>112</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 3</span></span> <span><span class='c'>#&gt; Columns: 12</span></span> <span><span class='c'>#&gt; $ Overall_CondGood <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.09009009, 0.17594787, 0.01234568</span></span> <span><span class='c'>#&gt; $ Overall_CondVery_Good <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.02702703, 0.06694313, 0.01646091</span></span> <span><span class='c'>#&gt; $ Overall_CondExcellent <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.01201201, 0.01303318, 0.02880658</span></span> <span><span class='c'>#&gt; $ Overall_CondVery_Excellent <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0, 0, 0</span></span> <span><span class='c'>#&gt; $ Year_Built <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1989.645, 1956.471, 1999.572</span></span> <span><span class='c'>#&gt; $ Year_Remod_Add <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1996.090, 1974.518, 2003.379</span></span> <span><span class='c'>#&gt; $ Roof_StyleGable <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.8238238, 0.8234597, 0.4444444</span></span> <span><span class='c'>#&gt; $ Roof_StyleGambrel <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.005005005, 0.010071090, 0.000000000</span></span> <span><span class='c'>#&gt; $ Roof_StyleHip <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.1531532, 0.1558057, 0.5555556</span></span> <span><span class='c'>#&gt; $ Roof_StyleMansard <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.005005005, 0.003554502, 0.000000000</span></span> <span><span class='c'>#&gt; $ Roof_StyleShed <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.003003003, 0.001184834, 0.000000000</span></span> <span><span class='c'>#&gt; $ Roof_MatlCompShg <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.9759760, 0.9905213, 0.9876543</span></span> <span></span></code></pre> </div> <p>Fitting a K-prototype model is done by setting the engine in <a href="https://tidyclust.tidymodels.org/reference/k_means.html" target="_blank" rel="noopener"><code>k_means()</code></a> to <code>&quot;clustMixType&quot;</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kproto_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tidyclust.tidymodels.org/reference/k_means.html'>k_means</a></span><span class='o'>(</span>num_clusters <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"clustMixType"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>kproto_fit</span> <span class='o'>&lt;-</span> <span class='nv'>kproto_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span></span></code></pre> </div> <p>The clusters can now be extracted on the original data format as categorical predictors are supported.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kproto_fit</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://tidyclust.tidymodels.org/reference/extract_centroids.html'>extract_centroids</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>select</span><span class='o'>(</span><span class='m'>11</span><span class='o'>:</span><span class='m'>20</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>glimpse</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 3</span></span> <span><span class='c'>#&gt; Columns: 10</span></span> <span><span class='c'>#&gt; $ Lot_Config <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Inside, Inside, Inside</span></span> <span><span class='c'>#&gt; $ Land_Slope <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Gtl, Gtl, Gtl</span></span> <span><span class='c'>#&gt; $ Neighborhood <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> College_Creek, North_Ames, Northridge_Heights</span></span> <span><span class='c'>#&gt; $ Condition_1 <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Norm, Norm, Norm</span></span> <span><span class='c'>#&gt; $ Condition_2 <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Norm, Norm, Norm</span></span> <span><span class='c'>#&gt; $ Bldg_Type <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> OneFam, OneFam, OneFam</span></span> <span><span class='c'>#&gt; $ House_Style <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Two_Story, One_Story, One_Story</span></span> <span><span class='c'>#&gt; $ Overall_Cond <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> Average, Average, Average</span></span> <span><span class='c'>#&gt; $ Year_Built <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1989.977, 1953.793, 1998.765</span></span> <span><span class='c'>#&gt; $ Year_Remod_Add <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1995.934, 1972.973, 2003.035</span></span> <span></span></code></pre> </div> <h2 id="stricter-rsample-functions">Stricter rsample functions <a href="#stricter-rsample-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Before version 1.2.0 of rsample, misspelled and wrongly used arguments would be swallowed silently by the functions. This could be a big source of confusion as it is easy to slip between the cracks. We have made changes to all rsample functions such that whenever possible they alert the user when something is wrong.</p> <p>Before 1.2.0 when you, for example, misspelled <code>strata</code> as <code>stata</code>, everything would go on like normal, with no indication that <code>stata</code> was ignored.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">initial_split</span><span class="p">(</span><span class="n">ames</span><span class="p">,</span> <span class="n">prop</span> <span class="o">=</span> <span class="m">0.75</span><span class="p">,</span> <span class="n">stata</span> <span class="o">=</span> <span class="n">Neighborhood</span><span class="p">)</span> <span class="c1">#&gt; &lt;Training/Testing/Total&gt;</span> <span class="c1">#&gt; &lt;2197/733/2930&gt;</span> </code></pre></div><p>The same code will now error and point to the problematic arguments.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>initial_split</span><span class='o'>(</span><span class='nv'>ames</span>, prop <span class='o'>=</span> <span class='m'>0.75</span>, stata <span class='o'>=</span> <span class='nv'>Neighborhood</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `initial_split()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `...` must be empty.</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> Problematic argument:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> stata = Neighborhood</span></span> <span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>butcher: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</li> <li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/wbuchanan" target="_blank" rel="noopener">@wbuchanan</a>.</li> <li>modeldata: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>parsnip: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gmcmacran" target="_blank" rel="noopener">@gmcmacran</a>, <a href="https://github.com/SHo-JANG" target="_blank" rel="noopener">@SHo-JANG</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/vidarsumo" target="_blank" rel="noopener">@vidarsumo</a>.</li> <li>recipes: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>, <a href="https://github.com/andreranza" target="_blank" rel="noopener">@andreranza</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jkennel" target="_blank" rel="noopener">@jkennel</a>, <a href="https://github.com/millermc38" target="_blank" rel="noopener">@millermc38</a>, <a href="https://github.com/nikosGeography" target="_blank" rel="noopener">@nikosGeography</a>, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/Sade154" target="_blank" rel="noopener">@Sade154</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>.</li> <li>rsample: <a href="https://github.com/godscloset" target="_blank" rel="noopener">@godscloset</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>textrecipes: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/gaohuachuan" target="_blank" rel="noopener">@gaohuachuan</a>.</li> <li>themis: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>.</li> <li>tidyclust: <a href="https://github.com/coforfe" target="_blank" rel="noopener">@coforfe</a>, <a href="https://github.com/cphaarmeyer" target="_blank" rel="noopener">@cphaarmeyer</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/michaelgrund" target="_blank" rel="noopener">@michaelgrund</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, and <a href="https://github.com/trevorcampbell" target="_blank" rel="noopener">@trevorcampbell</a>.</li> <li>tidymodels: <a href="https://github.com/nikosGeography" target="_blank" rel="noopener">@nikosGeography</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>tune: <a href="https://github.com/dramanica" target="_blank" rel="noopener">@dramanica</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/forecastingEDs" target="_blank" rel="noopener">@forecastingEDs</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/kbodwin" target="_blank" rel="noopener">@kbodwin</a>, <a href="https://github.com/KJT-Habitat" target="_blank" rel="noopener">@KJT-Habitat</a>, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> </ul> </div> <p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p> pak 0.6.0 https://www.tidyverse.org/blog/2023/09/pak-0-6-0/ Tue, 05 Sep 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/09/pak-0-6-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re delighted to announce the release of <a href="https://pak.r-lib.org" target="_blank" rel="noopener">pak</a> 0.6.0. pak helps with the installation of R packages and many related tasks.</p> <p>You can install pak from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"pak"</span><span class='o'>)</span></span></code></pre> </div> <p>If you use an older R version, or a platform that CRAN does not have binary packages for, it is faster and simpler to install pak from our repository. <a href="https://pak.r-lib.org/reference/install.html" target="_blank" rel="noopener">See the details in the manual.</a></p> <p>This blog post focuses on the exciting new improvements in the matching and installation of system requirements on Linux systems.</p> <p>You can see a full list of changes in the <a href="https://github.com/r-lib/pak/releases/tag/v0.6.0" target="_blank" rel="noopener">release notes</a></p> <h2 id="system-requirements">System requirements <a href="#system-requirements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many R packages require the installation of external software, otherwise they do not work, or even load. For example, the RPostgres R package requires the PostgreSQL client library, and by default dynamically links to it on Linux systems. This means that you (or the administrators of your system) need to install this library, typically in the form of a system package: <code>libpq-dev</code> on Ubuntu and Debian systems, or <code>postgresql-server-devel</code> or <code>postgresql-devel</code> on Red Hat, Fedora, etc. systems.</p> <p>The good news is that pak now helps you with this:</p> <ul> <li>it looks up the required system packages when installing R packages,</li> <li>it lets you know if any required system packages are missing from your system, before the installation, and</li> <li>it installs them automatically, if you are a superuser, or if you can use password-less <code>sudo</code> to start a superuser shell.</li> </ul> <p>In addition, pak now also has some functions to query system requirements and system packages.</p> <h2 id="supported-platforms">Supported platforms <a href="#supported-platforms"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>pak 0.6.0 supports the following Linux systems currently:</p> <ul> <li>Ubuntu Linux,</li> <li>Debian Linux,</li> <li>Red Hat Enterprise Linux,</li> <li>SUSE Linux Enterprise,</li> <li>OpenSUSE,</li> <li>CentOS,</li> <li>Rocky Linux,</li> <li>Fedora Linux.</li> </ul> <p>Call <a href="https://pak.r-lib.org/reference/sysreqs_platforms.html" target="_blank" rel="noopener"><code>pak::sysreqs_platforms()</code></a> to query the current list of supported platforms:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/sysreqs_platforms.html'>sysreqs_platforms</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>[</span>,<span class='m'>1</span><span class='o'>:</span><span class='m'>3</span><span class='o'>]</span></span> <span><span class='c'>#&gt; name os distribution</span></span> <span><span class='c'>#&gt; 1 Ubuntu Linux linux ubuntu</span></span> <span><span class='c'>#&gt; 2 Debian Linux linux debian</span></span> <span><span class='c'>#&gt; 3 CentOS Linux linux centos</span></span> <span><span class='c'>#&gt; 4 Rocky Linux linux rockylinux</span></span> <span><span class='c'>#&gt; 5 Red Hat Enterprise Linux linux redhat</span></span> <span><span class='c'>#&gt; 6 Red Hat Enterprise Linux linux redhat</span></span> <span><span class='c'>#&gt; 7 Red Hat Enterprise Linux linux redhat</span></span> <span><span class='c'>#&gt; 8 Fedora Linux linux fedora</span></span> <span><span class='c'>#&gt; 9 openSUSE Linux linux opensuse</span></span> <span><span class='c'>#&gt; 10 SUSE Linux Enterprise linux sle</span></span> <span></span></code></pre> </div> <p>Call <a href="https://pak.r-lib.org/reference/system_r_platform.html" target="_blank" rel="noopener"><code>pak::system_r_platform()</code></a> to check if pak has detected your platform correctly, and <a href="https://pak.r-lib.org/reference/sysreqs_is_supported.html" target="_blank" rel="noopener"><code>pak::sysreqs_is_supported()</code></a> to see if it is supported:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/system_r_platform.html'>system_r_platform</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "x86_64-pc-linux-gnu-ubuntu-22.04"</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/sysreqs_is_supported.html'>sysreqs_is_supported</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span></code></pre> </div> <h2 id="r-package-installation">R package installation <a href="#r-package-installation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you are using pak as the <code>root</code> user, on a supported platform, then during package installation pak will look up the required system packages, and will install the missing ones. Here is an example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/pkg_install.html'>pkg_install</a></span><span class='o'>(</span><span class='s'>"RPostgres"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Loading metadata database<span style='color: #00BB00;'>v</span> Loading metadata database ... done</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; &gt; Will <span style='font-style: italic;'>install</span> 12 packages.</span></span> <span><span class='c'>#&gt; &gt; Will <span style='font-style: italic;'>download</span> 12 packages with unknown size.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>DBI</span> 1.1.3 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>RPostgres</span> 1.4.5 [dl]<span style='color: #555555;'> + </span><span style='color: #BB0000;'>x</span><span style='color: #00BBBB;'> libpq-dev</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>Rcpp</span> 1.0.11 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>bit</span> 4.0.5 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>bit64</span> 4.0.5 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>blob</span> 1.2.4 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>generics</span> 0.1.3 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>hms</span> 1.1.3 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>lubridate</span> 1.9.2 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>pkgconfig</span> 2.0.3 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>timechange</span> 0.2.0 [dl]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>withr</span> 2.5.0 [dl]</span></span> <span><span class='c'>#&gt; &gt; Will <span style='font-style: italic;'>install</span> 1 system package:</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #00BBBB;'>libpq-dev</span> <span style='color: #555555;'>- </span><span style='color: #0000BB;'>RPostgres</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Getting 12 pkgs with unknown sizes</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>blob</span> 1.2.4 (x86_64-pc-linux-gnu-ubuntu-22.04) (45.94 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>generics</span> 0.1.3 (x86_64-pc-linux-gnu-ubuntu-22.04) (76.24 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>hms</span> 1.1.3 (x86_64-pc-linux-gnu-ubuntu-22.04) (98.35 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>RPostgres</span> 1.4.5 (x86_64-pc-linux-gnu-ubuntu-22.04) (455.11 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>bit64</span> 4.0.5 (x86_64-pc-linux-gnu-ubuntu-22.04) (475.41 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>pkgconfig</span> 2.0.3 (x86_64-pc-linux-gnu-ubuntu-22.04) (17.58 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>timechange</span> 0.2.0 (x86_64-pc-linux-gnu-ubuntu-22.04) (169.26 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>DBI</span> 1.1.3 (x86_64-pc-linux-gnu-ubuntu-22.04) (759.31 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>withr</span> 2.5.0 (x86_64-pc-linux-gnu-ubuntu-22.04) (228.73 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>bit</span> 4.0.5 (x86_64-pc-linux-gnu-ubuntu-22.04) (1.13 MB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>lubridate</span> 1.9.2 (x86_64-pc-linux-gnu-ubuntu-22.04) (980.37 kB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Got <span style='color: #0000BB;'>Rcpp</span> 1.0.11 (x86_64-pc-linux-gnu-ubuntu-22.04) (2.15 MB)</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Installing system requirements</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Executing `sh -c apt-get -y update`</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Executing `sh -c apt-get -y install libpq-dev`</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>DBI</span> 1.1.3 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>RPostgres</span> 1.4.5 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>Rcpp</span> 1.0.11 <span style='color: #9E9E9E;'>(1.2s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>bit</span> 4.0.5 <span style='color: #9E9E9E;'>(1.2s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>bit64</span> 4.0.5 <span style='color: #9E9E9E;'>(126ms)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>blob</span> 1.2.4 <span style='color: #9E9E9E;'>(86ms)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>generics</span> 0.1.3 <span style='color: #9E9E9E;'>(83ms)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>hms</span> 1.1.3 <span style='color: #9E9E9E;'>(59ms)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>lubridate</span> 1.9.2 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>pkgconfig</span> 2.0.3 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>timechange</span> 0.2.0 <span style='color: #9E9E9E;'>(63ms)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>withr</span> 2.5.0 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> 1 pkg + 16 deps: kept 5, added 12, dld 12 (6.58 MB) <span style='color: #B2B2B2;'>[17.1s]</span></span></span> <span></span></code></pre> </div> <h3 id="running-r-as-a-regular-user">Running R as a regular user <a href="#running-r-as-a-regular-user"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you don&rsquo;t want to use R as the superuser, but you can set up <code>sudo</code> without a password, that works as well. pak will detect the password-less <code>sudo</code> capability, and use it to install system packages, as needed.</p> <p>If you run R as a regular (not root) user, and password-less <code>sudo</code> is not available, then pak will print the system requirements, but it will not try to install or update them.</p> <p>If you are compiling R packages from source, and they need to link to system libraries, then their installation will probably fail, until you install these system packages.</p> <p>If you are installing binary R packages (e.g. from <a href="https://packagemanager.posit.co/client/#/" target="_blank" rel="noopener">P3M</a>), then the installation typically succeeds, but you won&rsquo;t be able to load these packages into R, until you install the required system packages.</p> <p>To demonstrate this, let&rsquo;s remove the system package for the PostgreSQL client library:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/system.html'>system</a></span><span class='o'>(</span><span class='s'>"apt-get remove -y libpq5"</span><span class='o'>)</span></span></code></pre> </div> <p>If now we (re)install the binary RPostgres R package, the installation will succeed, but then <a href="https://rdrr.io/r/base/library.html" target="_blank" rel="noopener"><code>library()</code></a> fails because of the missing system package. (We will fix the broken R package below.)</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Loading metadata database<span style='color: #00BB00;'>v</span> Loading metadata database ... done</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; &gt; Will <span style='font-style: italic;'>install</span> 1 package.</span></span> <span><span class='c'>#&gt; &gt; Will <span style='font-style: italic;'>download</span> 1 package with unknown size.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #0000BB;'>RPostgres</span> 1.4.5 [dl]<span style='color: #555555;'> + </span><span style='color: #BB0000;'>x</span><span style='color: #00BBBB;'> libpq-dev</span></span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>x</span> Missing 1 system package. You'll probably need to install it manually:</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>+ </span><span style='color: #00BBBB;'>libpq-dev</span> <span style='color: #555555;'>- </span><span style='color: #0000BB;'>RPostgres</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Getting 1 pkg with unknown size</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Cached copy of <span style='color: #0000BB;'>RPostgres</span> 1.4.5 (x86_64-pc-linux-gnu-ubuntu-22.04) is the latest build</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> Installed <span style='color: #0000BB;'>RPostgres</span> 1.4.5 <span style='color: #9E9E9E;'>(1.1s)</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>v</span> 1 pkg + 16 deps: kept 16, added 1 <span style='color: #B2B2B2;'>[5.7s]</span></span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rpostgres.r-dbi.org'>RPostgres</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: package or namespace load failed for 'RPostgres' in dyn.load(file, DLLpath = DLLpath, ...):</span></span> <span><span class='c'>#&gt; unable to load shared object '/root/R/x86_64-pc-linux-gnu-library/4.3/RPostgres/libs/RPostgres.so':</span></span> <span><span class='c'>#&gt; libpq.so.5: cannot open shared object file: No such file or directory</span></span> <span><span class='c'>#&gt; Execution halted</span></span> <span></span></code></pre> </div> <h2 id="opting-out">Opting out <a href="#opting-out"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you don&rsquo;t want pak to install system packages for you, set the <code>PKG_SYSREQS</code> environment variable to <code>false</code>, or the <code>pkg.sysreqs</code> option to <code>FALSE</code>. See the complete list of configuration options in the <a href="https://pak.r-lib.org/reference/pak-config.html" target="_blank" rel="noopener"><code>config?pak</code></a> manual page.</p> <h2 id="system-requirements-queries">System requirements queries <a href="#system-requirements-queries"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>pak 0.6.0 also has a number of functions to query system requirements and system packages. The <a href="https://pak.r-lib.org/reference/pkg_sysreqs.html" target="_blank" rel="noopener"><code>pak::pkg_sysreqs()</code></a> function is similar to <a href="https://pak.r-lib.org/reference/pkg_deps.html" target="_blank" rel="noopener"><code>pak::pkg_deps()</code></a> but in addition to looking up package dependencies, it also looks up system dependencies, and only reports the latter:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/pkg_sysreqs.html'>pkg_sysreqs</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"curl"</span>, <span class='s'>"r-lib/xml2"</span>, <span class='s'>"devtools"</span>, <span class='s'>"CHRONOS"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Loading metadata database<span style='color: #00BB00;'>v</span> Loading metadata database ... done</span></span> <span><span class='c'>#&gt; -- Install scripts --------------------------------------------- Ubuntu 22.04 --</span></span> <span><span class='c'>#&gt; apt-get -y update</span></span> <span><span class='c'>#&gt; apt-get -y install libcurl4-openssl-dev libssl-dev git make libgit2-dev \</span></span> <span><span class='c'>#&gt; zlib1g-dev pandoc libfreetype6-dev libjpeg-dev libpng-dev libtiff-dev \</span></span> <span><span class='c'>#&gt; libicu-dev libfontconfig1-dev libfribidi-dev libharfbuzz-dev libxml2-dev \</span></span> <span><span class='c'>#&gt; libglpk-dev libgmp3-dev default-jdk</span></span> <span><span class='c'>#&gt; R CMD javareconf</span></span> <span><span class='c'>#&gt; R CMD javareconf</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; -- Packages and their system dependencies --------------------------------------</span></span> <span><span class='c'>#&gt; CHRONOS -- default-jdk, pandoc</span></span> <span><span class='c'>#&gt; credentials -- git</span></span> <span><span class='c'>#&gt; curl -- libcurl4-openssl-dev, libssl-dev</span></span> <span><span class='c'>#&gt; fs -- make</span></span> <span><span class='c'>#&gt; gert -- libgit2-dev</span></span> <span><span class='c'>#&gt; gitcreds -- git</span></span> <span><span class='c'>#&gt; httpuv -- make, zlib1g-dev</span></span> <span><span class='c'>#&gt; igraph -- libglpk-dev, libgmp3-dev, libxml2-dev</span></span> <span><span class='c'>#&gt; knitr -- pandoc</span></span> <span><span class='c'>#&gt; openssl -- libssl-dev</span></span> <span><span class='c'>#&gt; pkgdown -- pandoc</span></span> <span><span class='c'>#&gt; png -- libpng-dev</span></span> <span><span class='c'>#&gt; ragg -- libfreetype6-dev, libjpeg-dev, libpng-dev, libtiff-dev</span></span> <span><span class='c'>#&gt; RCurl -- libcurl4-openssl-dev, make</span></span> <span><span class='c'>#&gt; remotes -- git</span></span> <span><span class='c'>#&gt; rJava -- default-jdk, make</span></span> <span><span class='c'>#&gt; rmarkdown -- pandoc</span></span> <span><span class='c'>#&gt; sass -- make</span></span> <span><span class='c'>#&gt; stringi -- libicu-dev</span></span> <span><span class='c'>#&gt; systemfonts -- libfontconfig1-dev, libfreetype6-dev</span></span> <span><span class='c'>#&gt; textshaping -- libfreetype6-dev, libfribidi-dev, libharfbuzz-dev</span></span> <span><span class='c'>#&gt; XML -- libxml2-dev</span></span> <span><span class='c'>#&gt; xml2 -- libxml2-dev</span></span> <span></span></code></pre> </div> <p>See the manual of <a href="https://pak.r-lib.org/reference/pkg_sysreqs.html" target="_blank" rel="noopener"><code>pak::pkg_sysreqs()</code></a> to learn how to programmatically extract information from its return value.</p> <p> <a href="https://pak.r-lib.org/reference/sysreqs_check_installed.html" target="_blank" rel="noopener"><code>pak::sysreqs_check_installed()</code></a> is a handy function that checks if all system requirements are installed for some or all R packages in your library. This should report our broken RPostgres package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/sysreqs_check_installed.html'>sysreqs_check_installed</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; system package installed required by</span></span> <span><span class='c'>#&gt; -------------- -- -----------</span></span> <span><span class='c'>#&gt; libpq-dev <span style='color: #BB0000;'>x</span> RPostgres</span></span> <span></span></code></pre> </div> <p> <a href="https://pak.r-lib.org/reference/sysreqs_check_installed.html" target="_blank" rel="noopener"><code>pak::sysreqs_fix_installed()</code></a> goes one step further and also tries to install the missing system requirements:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/sysreqs_check_installed.html'>sysreqs_fix_installed</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Need to install 1 system package.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Installing system requirements</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Executing `sh -c apt-get -y update`</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>i</span> Executing `sh -c apt-get -y install libpq-dev`</span></span> <span></span></code></pre> </div> <p>Now we can load RPostgres again:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rpostgres.r-dbi.org'>RPostgres</a></span><span class='o'>)</span></span> <span></span></code></pre> </div> <h2 id="configuration">Configuration <a href="#configuration"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are several pak configuration options you can use to adjust how system requirements are handled. See the complete list in the <a href="https://pak.r-lib.org/reference/pak-config.html" target="_blank" rel="noopener"><code>config?pak</code></a> manual page.</p> <h2 id="other-related-pak-functions">Other related pak functions <a href="#other-related-pak-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><ul> <li> <a href="https://pak.r-lib.org/reference/sysreqs_db_list.html" target="_blank" rel="noopener"><code>pak::sysreqs_db_list()</code></a>, <code>pak::sysreqs_dbmatch()</code> and <a href="https://pak.r-lib.org/reference/sysreqs_db_update.html" target="_blank" rel="noopener"><code>pak::sysreqs_db_update()</code></a> list, query and update the built-in system requirements database.</li> <li> <a href="https://pak.r-lib.org/reference/sysreqs_list_system_packages.html" target="_blank" rel="noopener"><code>pak::sysreqs_list_system_packages()</code></a> lists system packages, including virtual packages and the features they provide.</li> </ul> <h2 id="more-information">More information <a href="#more-information"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><ul> <li> <a href="https://pak.r-lib.org/" target="_blank" rel="noopener">pak documentation</a></li> <li> <a href="https://pak.r-lib.org/reference/sysreqs.html" target="_blank" rel="noopener">System requirements manual page</a></li> <li> <a href="https://github.com/rstudio/r-system-requirements" target="_blank" rel="noopener">System requirements database</a></li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thank you to all those who have contributed to pak, or one of its workhorse packages since the v0.5.1 release:</p> <p> <a href="https://github.com/alexpate30" target="_blank" rel="noopener">@alexpate30</a>, <a href="https://github.com/averissimo" target="_blank" rel="noopener">@averissimo</a>, <a href="https://github.com/ArnaudKunzi" target="_blank" rel="noopener">@ArnaudKunzi</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/Darxor" target="_blank" rel="noopener">@Darxor</a>, <a href="https://github.com/drmowinckels" target="_blank" rel="noopener">@drmowinckels</a>, <a href="https://github.com/Fan-iX" target="_blank" rel="noopener">@Fan-iX</a>, <a href="https://github.com/gongyh" target="_blank" rel="noopener">@gongyh</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/idavydov" target="_blank" rel="noopener">@idavydov</a>, <a href="https://github.com/jefferis" target="_blank" rel="noopener">@jefferis</a>, <a href="https://github.com/joan-yanqiong" target="_blank" rel="noopener">@joan-yanqiong</a>, <a href="https://github.com/kevinushey" target="_blank" rel="noopener">@kevinushey</a>, <a href="https://github.com/kkmann" target="_blank" rel="noopener">@kkmann</a>, <a href="https://github.com/klmr" target="_blank" rel="noopener">@klmr</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lgaborini" target="_blank" rel="noopener">@lgaborini</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/maximsmol" target="_blank" rel="noopener">@maximsmol</a>, <a href="https://github.com/michaelmayer2" target="_blank" rel="noopener">@michaelmayer2</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/pascalgulikers" target="_blank" rel="noopener">@pascalgulikers</a>, <a href="https://github.com/pawelru" target="_blank" rel="noopener">@pawelru</a>, <a href="https://github.com/royfrancis" target="_blank" rel="noopener">@royfrancis</a>, <a href="https://github.com/tanho63" target="_blank" rel="noopener">@tanho63</a>, <a href="https://github.com/thomasyu888" target="_blank" rel="noopener">@thomasyu888</a>, <a href="https://github.com/vincent-hanlon" target="_blank" rel="noopener">@vincent-hanlon</a>, and <a href="https://github.com/VincentGuyader" target="_blank" rel="noopener">@VincentGuyader</a>.</p> New interface to validation splits https://www.tidyverse.org/blog/2023/08/validation-split-as-3-way-split/ Fri, 25 Aug 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/08/validation-split-as-3-way-split/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of a new interface to validation splits in <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a> 1.2.0 and <a href="https://tune.tidymodels.org/" target="_blank" rel="noopener">tune</a> 1.1.2. The rsample package makes it easy to create resamples for assessing model performance. The tune package facilitates hyperparameter tuning for the tidymodels packages.</p> <p>You can install the new versions from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"rsample"</span>, <span class='s'>"tune"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will walk you through how to make a validation split and use it for tuning.</p> <p>You can see a full list of changes in the release notes for <a href="https://github.com/tidymodels/rsample/releases/tag/v1.2.0" target="_blank" rel="noopener">rsample</a> and <a href="https://github.com/tidymodels/tune/releases/tag/v1.1.2" target="_blank" rel="noopener">tune</a>.</p> <p>Let&rsquo;s start with loading the tidymodels package which will load, among others, both rsample and tune.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching packages</span> ────────────────────────────────────── tidymodels 1.1.1 ──</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>broom </span> 1.0.5 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>recipes </span> 1.0.7</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dials </span> 1.2.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>rsample </span> 1.2.0</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.1.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.2.1</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.4.3 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.3.0</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>infer </span> 1.0.4 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tune </span> 1.1.2</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>modeldata </span> 1.2.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflows </span> 1.1.3</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>parsnip </span> 1.1.1 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflowsets</span> 1.0.1</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 1.0.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>yardstick </span> 1.2.0</span></span> <span></span><span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>purrr</span>::<span style='color: #00BB00;'>discard()</span> masks <span style='color: #0000BB;'>scales</span>::discard()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>recipes</span>::<span style='color: #00BB00;'>step()</span> masks <span style='color: #0000BB;'>stats</span>::step()</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>•</span> Use suppressPackageStartupMessages() to eliminate package startup messages</span></span> <span></span></code></pre> </div> <h2 id="the-new-functions">The new functions <a href="#the-new-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You can now make a three-way split of your data instead of doing a sequence of two binary splits.</p> <ul> <li><code>initial_validation_split()</code> with variants <code>initial_validation_time_split()</code> and <code>group_initial_validation_split()</code> for the initial three-way split</li> <li><code>validation_set()</code> to create the <code>rset</code> for tuning containing the analysis (= training) and assessment (= validation) set</li> <li><code>training()</code>, <code>validation()</code>, and <code>testing()</code> for access to the separate subsets</li> <li><code>last_fit()</code> (and <code>fit_best()</code>) now also work on the initial three-way split</li> </ul> <h2 id="the-new-functions-in-action">The new functions in action <a href="#the-new-functions-in-action"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To illustrate how to use the new functions, we&rsquo;ll replicate an analysis of <a href="https://github.com/rfordatascience/tidytuesday/blob/master/data/2023/2023-05-09/readme.md" target="_blank" rel="noopener">childcare cost</a> from a <a href="https://github.com/rfordatascience/tidytuesday" target="_blank" rel="noopener">Tidy Tuesday</a> done by Julia Silge in one of her <a href="https://juliasilge.com/blog/childcare-costs/" target="_blank" rel="noopener">screencasts</a>.</p> <p>We are modeling the median weekly price for school-aged kids in childcare centers <code>mcsa</code> and are thus removing the other variables containing different variants of median prices (e.g., for different age groups). We are also removing the FIPS code identifying the county as we are including various characteristics of the counties instead of their ID.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://readr.tidyverse.org'>readr</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Attaching package: 'readr'</span></span> <span></span><span><span class='c'>#&gt; The following object is masked from 'package:yardstick':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; spec</span></span> <span></span><span><span class='c'>#&gt; The following object is masked from 'package:scales':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; col_factor</span></span> <span></span><span></span> <span><span class='nv'>childcare_costs</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://readr.tidyverse.org/reference/read_delim.html'>read_csv</a></span><span class='o'>(</span><span class='s'>'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-09/childcare_costs.csv'</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Rows: </span><span style='color: #0000BB;'>34567</span> <span style='font-weight: bold;'>Columns: </span><span style='color: #0000BB;'>61</span></span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>──</span> <span style='font-weight: bold;'>Column specification</span> <span style='color: #00BBBB;'>────────────────────────────────────────────────────────</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Delimiter:</span> ","</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>dbl</span> (61): county_fips_code, study_year, unr_16, funr_16, munr_16, unr_20to64...</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `spec()` to retrieve the full column specification for this data.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Specify the column types or set `show_col_types = FALSE` to quiet this message.</span></span> <span></span><span></span> <span><span class='nv'>childcare_costs</span> <span class='o'>&lt;-</span> <span class='nv'>childcare_costs</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>select</span><span class='o'>(</span><span class='o'>-</span><span class='nf'>matches</span><span class='o'>(</span><span class='s'>"^mc_|^mfc"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>select</span><span class='o'>(</span><span class='o'>-</span><span class='nv'>county_fips_code</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>drop_na</span><span class='o'>(</span><span class='o'>)</span> </span> <span></span> <span><span class='nf'>glimpse</span><span class='o'>(</span><span class='nv'>childcare_costs</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 23,593</span></span> <span><span class='c'>#&gt; Columns: 53</span></span> <span><span class='c'>#&gt; $ study_year <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 2008, 2009, 2010, 2011, 2012, 2013, 2014, 20…</span></span> <span><span class='c'>#&gt; $ unr_16 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 5.42, 5.93, 6.21, 7.55, 8.60, 9.39, 8.50, 7.…</span></span> <span><span class='c'>#&gt; $ funr_16 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4.41, 5.72, 5.57, 8.13, 8.88, 10.31, 9.18, 8…</span></span> <span><span class='c'>#&gt; $ munr_16 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6.32, 6.11, 6.78, 7.03, 8.29, 8.56, 7.95, 6.…</span></span> <span><span class='c'>#&gt; $ unr_20to64 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4.6, 4.8, 5.1, 6.2, 6.7, 7.3, 6.8, 5.9, 4.4,…</span></span> <span><span class='c'>#&gt; $ funr_20to64 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 3.5, 4.6, 4.6, 6.3, 6.4, 7.6, 6.8, 6.1, 4.6,…</span></span> <span><span class='c'>#&gt; $ munr_20to64 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 5.6, 5.0, 5.6, 6.1, 7.0, 7.0, 6.8, 5.9, 4.3,…</span></span> <span><span class='c'>#&gt; $ flfpr_20to64 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 68.9, 70.8, 71.3, 70.2, 70.6, 70.7, 69.9, 68…</span></span> <span><span class='c'>#&gt; $ flfpr_20to64_under6 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 66.9, 63.7, 67.0, 66.5, 67.1, 67.5, 65.2, 66…</span></span> <span><span class='c'>#&gt; $ flfpr_20to64_6to17 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 79.59, 78.41, 78.15, 77.62, 76.31, 75.91, 75…</span></span> <span><span class='c'>#&gt; $ flfpr_20to64_under6_6to17 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 60.81, 59.91, 59.71, 59.31, 58.30, 58.00, 57…</span></span> <span><span class='c'>#&gt; $ mlfpr_20to64 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 84.0, 86.2, 85.8, 85.7, 85.7, 85.0, 84.2, 82…</span></span> <span><span class='c'>#&gt; $ pr_f <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 8.5, 7.5, 7.5, 7.4, 7.4, 8.3, 9.1, 9.3, 9.4,…</span></span> <span><span class='c'>#&gt; $ pr_p <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 11.5, 10.3, 10.6, 10.9, 11.6, 12.1, 12.8, 12…</span></span> <span><span class='c'>#&gt; $ mhi_2018 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 58462.55, 60211.71, 61775.80, 60366.88, 5915…</span></span> <span><span class='c'>#&gt; $ me_2018 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 32710.60, 34688.16, 34740.84, 34564.32, 3432…</span></span> <span><span class='c'>#&gt; $ fme_2018 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 25156.25, 26852.67, 27391.08, 26727.68, 2796…</span></span> <span><span class='c'>#&gt; $ mme_2018 <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 41436.80, 43865.64, 46155.24, 45333.12, 4427…</span></span> <span><span class='c'>#&gt; $ total_pop <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 49744, 49584, 53155, 53944, 54590, 54907, 55…</span></span> <span><span class='c'>#&gt; $ one_race <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 98.1, 98.6, 98.5, 98.5, 98.5, 98.6, 98.7, 98…</span></span> <span><span class='c'>#&gt; $ one_race_w <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 78.9, 79.1, 79.1, 78.9, 78.9, 78.3, 78.0, 77…</span></span> <span><span class='c'>#&gt; $ one_race_b <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 17.7, 17.9, 17.9, 18.1, 18.1, 18.4, 18.6, 18…</span></span> <span><span class='c'>#&gt; $ one_race_i <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.4, 0.4, 0.3, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4,…</span></span> <span><span class='c'>#&gt; $ one_race_a <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.4, 0.6, 0.7, 0.7, 0.8, 1.0, 0.9, 1.0, 0.8,…</span></span> <span><span class='c'>#&gt; $ one_race_h <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1,…</span></span> <span><span class='c'>#&gt; $ one_race_other <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 0.7, 0.7, 0.6, 0.5, 0.4, 0.7, 0.7, 0.9, 1.4,…</span></span> <span><span class='c'>#&gt; $ two_races <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1.9, 1.4, 1.5, 1.5, 1.5, 1.4, 1.3, 1.6, 2.0,…</span></span> <span><span class='c'>#&gt; $ hispanic <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1.8, 2.0, 2.3, 2.4, 2.4, 2.5, 2.5, 2.6, 2.6,…</span></span> <span><span class='c'>#&gt; $ households <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 18373, 18288, 19718, 19998, 19934, 20071, 20…</span></span> <span><span class='c'>#&gt; $ h_under6_both_work <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1543, 1475, 1569, 1695, 1714, 1532, 1557, 13…</span></span> <span><span class='c'>#&gt; $ h_under6_f_work <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 970, 964, 1009, 1060, 938, 880, 1191, 1258, …</span></span> <span><span class='c'>#&gt; $ h_under6_m_work <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 22, 16, 16, 106, 120, 161, 159, 211, 109, 10…</span></span> <span><span class='c'>#&gt; $ h_under6_single_m <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 995, 1099, 1110, 1030, 1095, 1160, 954, 883,…</span></span> <span><span class='c'>#&gt; $ h_6to17_both_work <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4900, 5028, 5472, 5065, 4608, 4238, 4056, 40…</span></span> <span><span class='c'>#&gt; $ h_6to17_fwork <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1308, 1519, 1541, 1965, 1963, 1978, 2073, 20…</span></span> <span><span class='c'>#&gt; $ h_6to17_mwork <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 114, 92, 113, 246, 284, 354, 373, 551, 322, …</span></span> <span><span class='c'>#&gt; $ h_6to17_single_m <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1966, 2305, 2377, 2299, 2644, 2522, 2269, 21…</span></span> <span><span class='c'>#&gt; $ emp_m <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 27.40, 29.54, 29.33, 31.17, 32.13, 31.74, 32…</span></span> <span><span class='c'>#&gt; $ memp_m <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 24.41, 26.07, 25.94, 26.97, 28.59, 27.44, 28…</span></span> <span><span class='c'>#&gt; $ femp_m <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 30.68, 33.40, 33.06, 35.96, 36.09, 36.61, 37…</span></span> <span><span class='c'>#&gt; $ emp_service <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 17.06, 15.81, 16.92, 16.18, 16.09, 16.72, 16…</span></span> <span><span class='c'>#&gt; $ memp_service <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 15.53, 14.16, 15.09, 14.21, 14.71, 13.92, 13…</span></span> <span><span class='c'>#&gt; $ femp_service <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 18.75, 17.64, 18.93, 18.42, 17.63, 19.89, 20…</span></span> <span><span class='c'>#&gt; $ emp_sales <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 29.11, 28.75, 29.07, 27.56, 28.39, 27.22, 25…</span></span> <span><span class='c'>#&gt; $ memp_sales <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 15.97, 17.51, 17.82, 17.74, 17.79, 17.38, 15…</span></span> <span><span class='c'>#&gt; $ femp_sales <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 43.52, 41.25, 41.43, 38.76, 40.26, 38.36, 36…</span></span> <span><span class='c'>#&gt; $ emp_n <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 13.21, 11.89, 11.57, 10.72, 9.02, 9.27, 9.38…</span></span> <span><span class='c'>#&gt; $ memp_n <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 22.54, 20.30, 19.86, 18.28, 16.03, 16.79, 17…</span></span> <span><span class='c'>#&gt; $ femp_n <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 2.99, 2.52, 2.45, 2.09, 1.19, 0.77, 0.58, 0.…</span></span> <span><span class='c'>#&gt; $ emp_p <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 13.22, 14.02, 13.11, 14.38, 14.37, 15.04, 16…</span></span> <span><span class='c'>#&gt; $ memp_p <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 21.55, 21.96, 21.28, 22.80, 22.88, 24.48, 24…</span></span> <span><span class='c'>#&gt; $ femp_p <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 4.07, 5.19, 4.13, 4.77, 4.84, 4.36, 6.07, 7.…</span></span> <span><span class='c'>#&gt; $ mcsa <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 80.92, 83.42, 85.92, 88.43, 90.93, 93.43, 95…</span></span> <span></span></code></pre> </div> <p>Even after omitting rows with missing values are we left with 23593 observations. That is plenty to work with! We are likely to get a reliable estimate of the model performance from a validation set without having to fit and evaluate the model multiple times, as with, for example, v-fold cross-validation.</p> <p>We are creating a three-way split of the data into a training, a validation, and a test set with the new <code>initial_validation_split()</code> function. We are stratifying based on our outcome <code>mcsa</code>. The default of <code>prop = c(0.6, 0.2)</code> means that 60% of the data gets allocated to the training set and 20% to the validation set - and the remaining 20% go into the test set.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span></span> <span><span class='nv'>childcare_split</span> <span class='o'>&lt;-</span> <span class='nv'>childcare_costs</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>initial_validation_split</span><span class='o'>(</span>strata <span class='o'>=</span> <span class='nv'>mcsa</span><span class='o'>)</span></span> <span><span class='nv'>childcare_split</span></span> <span><span class='c'>#&gt; &lt;Training/Validation/Testing/Total&gt;</span></span> <span><span class='c'>#&gt; &lt;14155/4718/4720/23593&gt;</span></span> <span></span></code></pre> </div> <p>You can access the subsets of the data with the familiar <code>training()</code> and <code>testing()</code> as well as the new <code>validation()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>validation</span><span class='o'>(</span><span class='nv'>childcare_split</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4,718 × 53</span></span></span> <span><span class='c'>#&gt; study_year unr_16 funr_16 munr_16 unr_20to64 funr_20to64 munr_20to64</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>2</span>013 9.39 10.3 8.56 7.3 7.6 7 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>2</span>011 13.0 12.4 13.6 13.2 12.4 13.9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>2</span>008 3.85 4.4 3.43 3.7 3.9 3.6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>2</span>015 8.31 11.8 5.69 7.8 11.7 4.9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>2</span>015 7.67 6.92 8.27 7.6 6.7 8.3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>2</span>016 5.95 6.33 5.66 5.7 5.9 5.5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>2</span>009 10.7 15.9 7.06 8.7 16.8 2.9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>2</span>010 11.2 15.2 7.89 10.9 14.7 7.8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>2</span>013 15.0 17.0 13.4 15.2 18.1 13 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>2</span>014 17.4 16.3 18.2 17.2 17.7 16.9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 4,708 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 46 more variables: flfpr_20to64 &lt;dbl&gt;, flfpr_20to64_under6 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># flfpr_20to64_6to17 &lt;dbl&gt;, flfpr_20to64_under6_6to17 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># mlfpr_20to64 &lt;dbl&gt;, pr_f &lt;dbl&gt;, pr_p &lt;dbl&gt;, mhi_2018 &lt;dbl&gt;, me_2018 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># fme_2018 &lt;dbl&gt;, mme_2018 &lt;dbl&gt;, total_pop &lt;dbl&gt;, one_race &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># one_race_w &lt;dbl&gt;, one_race_b &lt;dbl&gt;, one_race_i &lt;dbl&gt;, one_race_a &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># one_race_h &lt;dbl&gt;, one_race_other &lt;dbl&gt;, two_races &lt;dbl&gt;, hispanic &lt;dbl&gt;, …</span></span></span> <span></span></code></pre> </div> <p>You may want to extract the training data to do some exploratory data analysis but here we are going to rely on xgboost to figure out patterns in the data so we can breeze straight to tuning a model.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>xgb_spec</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'>boost_tree</span><span class='o'>(</span></span> <span> trees <span class='o'>=</span> <span class='m'>500</span>,</span> <span> min_n <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>,</span> <span> mtry <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>,</span> <span> stop_iter <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>,</span> <span> learn_rate <span class='o'>=</span> <span class='m'>0.01</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>set_engine</span><span class='o'>(</span><span class='s'>"xgboost"</span>, validation <span class='o'>=</span> <span class='m'>0.2</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>set_mode</span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>xgb_wf</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>mcsa</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>xgb_spec</span><span class='o'>)</span></span> <span><span class='nv'>xgb_wf</span></span> <span><span class='c'>#&gt; ══ Workflow ════════════════════════════════════════════════════════════════════</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Preprocessor:</span> Formula</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Model:</span> boost_tree()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; mcsa ~ .</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; Boosted Tree Model Specification (regression)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Main Arguments:</span></span> <span><span class='c'>#&gt; mtry = tune()</span></span> <span><span class='c'>#&gt; trees = 500</span></span> <span><span class='c'>#&gt; min_n = tune()</span></span> <span><span class='c'>#&gt; learn_rate = 0.01</span></span> <span><span class='c'>#&gt; stop_iter = tune()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Engine-Specific Arguments:</span></span> <span><span class='c'>#&gt; validation = 0.2</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Computational engine: xgboost</span></span> <span></span></code></pre> </div> <p>We give this workflow object with the model specification to <code>tune_grid()</code> to try multiple combinations of the hyperparameters we tagged for tuning (<code>min_n</code>, <code>mtry</code>, and <code>stop_iter</code>).</p> <p>During tuning, the model should not have access to the test data, only to the data used to fit the model (the analysis set) and the data used to assess the model (the assessment set). Each pair of analysis and assessment set forms a resample. For 10-fold cross-validation, we&rsquo;d have 10 resamples. With a validation split, we have just one resample with the training set functioning as the analysis set and the validation set as the assessment set. The tidymodels tuning functions all expect a <em>set</em> of resamples (which can be of size one) and the corresponding objects are of class <code>rset</code>.</p> <p>To remove the test data from the initial three-way split and create such an <code>rset</code> object for tuning, use <code>validation_set()</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>234</span><span class='o'>)</span></span> <span><span class='nv'>childcare_set</span> <span class='o'>&lt;-</span> <span class='nf'>validation_set</span><span class='o'>(</span><span class='nv'>childcare_split</span><span class='o'>)</span></span> <span><span class='nv'>childcare_set</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span></span> <span><span class='c'>#&gt; splits id </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [14155/4718]&gt;</span> validation</span></span> <span></span></code></pre> </div> <p>We are going to try 15 different parameter combinations and pick the one with the smallest RMSE.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>234</span><span class='o'>)</span></span> <span><span class='nv'>xgb_res</span> <span class='o'>&lt;-</span> <span class='nf'>tune_grid</span><span class='o'>(</span><span class='nv'>xgb_wf</span>, <span class='nv'>childcare_set</span>, grid <span class='o'>=</span> <span class='m'>15</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>i</span> <span style='color: #000000;'>Creating pre-processing data to finalize unknown parameter: mtry</span></span></span> <span></span><span><span class='c'>#&gt; Warning in `[.tbl_df`(x, is.finite(x &lt;- as.numeric(x))): NAs introduced by coercion</span></span> <span></span><span><span class='nv'>best_parameters</span> <span class='o'>&lt;-</span> <span class='nf'>select_best</span><span class='o'>(</span><span class='nv'>xgb_res</span>, <span class='s'>"rmse"</span><span class='o'>)</span></span> <span><span class='nv'>childcare_wflow</span> <span class='o'>&lt;-</span> <span class='nf'>finalize_workflow</span><span class='o'>(</span><span class='nv'>xgb_wf</span>, <span class='nv'>best_parameters</span><span class='o'>)</span></span></code></pre> </div> <p><code>last_fit()</code> then lets you fit your model on the training data and calculate performance on the test data. If you provide it with a three-way split, you can choose if you want your model to be fitted on the training data only or on the combination of training and validation set. You can specify this with the <code>add_validation_set</code> argument.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>childcare_fit</span> <span class='o'>&lt;-</span> <span class='nf'>last_fit</span><span class='o'>(</span><span class='nv'>childcare_wflow</span>, <span class='nv'>childcare_split</span>, add_validation_set <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>childcare_fit</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 4</span></span></span> <span><span class='c'>#&gt; .metric .estimator .estimate .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> rmse standard 21.4 Preprocessor1_Model1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> rsq standard 0.610 Preprocessor1_Model1</span></span> <span></span></code></pre> </div> <p>This takes you through the important changes for validation sets in the tidymodels framework!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many thanks to the people who contributed since the last releases!</p> <p>For rsample: <a href="https://github.com/afrogri37" target="_blank" rel="noopener">@afrogri37</a>, <a href="https://github.com/AngelFelizR" target="_blank" rel="noopener">@AngelFelizR</a>, <a href="https://github.com/bschneidr" target="_blank" rel="noopener">@bschneidr</a>, <a href="https://github.com/erictleung" target="_blank" rel="noopener">@erictleung</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/sametsoekel" target="_blank" rel="noopener">@sametsoekel</a>, <a href="https://github.com/Shafi2016" target="_blank" rel="noopener">@Shafi2016</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/trevorcampbell" target="_blank" rel="noopener">@trevorcampbell</a>.</p> <p>For tune: <a href="https://github.com/blechturm" target="_blank" rel="noopener">@blechturm</a>, <a href="https://github.com/cphaarmeyer" target="_blank" rel="noopener">@cphaarmeyer</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/forecastingEDs" target="_blank" rel="noopener">@forecastingEDs</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/kjbeath" target="_blank" rel="noopener">@kjbeath</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> webR 0.2.0 has been released https://www.tidyverse.org/blog/2023/08/webr-0-2-0/ Wed, 16 Aug 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/08/webr-0-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) * [x] Release webR 0.2.0 * [] Update all links from /0.2.0-rc.1 to /0.2.0 * [x] Update webr-repo packages * [x] Update webr-repo dashboard --> <!-- Initialise webR in the page --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.css"> <style> .CodeMirror pre { background-color: unset !important; } .btn-webr { background-color: #EEEEEE; border-bottom-left-radius: 0; border-bottom-right-radius: 0; } </style> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/mode/r/r.js"></script> <script type="module"> import { WebR } from 'https://webr.r-wasm.org/v0.4.2/webr.mjs'; globalThis.webR = new WebR(); await globalThis.webR.init(); await webR.FS.mkdir('/persist'); await webR.FS.mount('IDBFS', {}, '/persist'); await webR.FS.syncfs(true); await webR.evalRVoid("webr::shim_install()"); await webR.evalRVoid("webr::global_prompt_install()", { withHandlers: false }); globalThis.webRCodeShelter = await new globalThis.webR.Shelter(); document.querySelectorAll(".btn-webr").forEach((btn) => { btn.innerText = 'Run code'; btn.disabled = false; }); </script> <!-- Add webr engine for knit --> <div class="highlight"> </div> <p>We&rsquo;re absolutely thrilled to announce the release of <a href="https://docs.r-wasm.org/webr/v0.2.0/" target="_blank" rel="noopener">webR</a> 0.2.0! This release gathers together many updates and improvements to webR over the last few months, including improvements to the HTML canvas graphics device, support for Cairo-based bitmap graphics, accessibility and internationalisation improvements, additional Wasm R package support (including Shiny), a new webR REPL app, and various updates to the webR developer API.</p> <p>This blog post will take a deep dive through the major breaking changes and new features available in webR 0.2.0. I also plan to record and release a series of companion videos discussing the new release, so keep an eye out if you&rsquo;re someone who prefers watching and listening over reading long-form articles. I&rsquo;ll update this post with all the links once they&rsquo;re available.</p> <h2 id="webassembly-and-webr">WebAssembly and webR <a href="#webassembly-and-webr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>My previous <a href="https://www.tidyverse.org/blog/2023/03/webr-0-1-0/" target="_blank" rel="noopener">webR release blog post</a> goes into detail about what WebAssembly is, why people are excited about it, and how it relates to the R community and ecosystem in general through webR. I would recommend it as a good place to start, if the project is new to you<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>.</p> <p>A short explanation is that WebAssembly (also known as Wasm) allows software that&rsquo;s normally compiled for a specific computer system to instead run anywhere, including in web browsers. Wasm is the technology that powers <a href="https://pyodide.org" target="_blank" rel="noopener">Pyodide</a> (used by <a href="https://shiny.rstudio.com/py/docs/shinylive.html" target="_blank" rel="noopener">Shinylive for Python</a>) and webR brings this technology to the R world. Using webR it is possible to run R code directly in a web browser<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>, without the need for the traditional supporting R server to execute the code.</p> <p>Running R code directly in a browser opens the door for many new and exciting uses for R on the web. Applications that I&rsquo;m personally excited in seeing developed are,</p> <ul> <li>Live and interactive R code and graphics in documents &amp; presentations,</li> <li>Tactile educational content for R, with examples that can be remixed on-the-fly by learners,</li> <li>Reproducible statistics through containerisation and notebook-style literate programming.</li> </ul> <p>Even in these early days, some of this is already being provided by development of downstream projects such as James Balamuta&rsquo;s <a href="https://github.com/coatless/quarto-webr" target="_blank" rel="noopener">quarto-webr</a> extension, allowing Quarto users to easily embed interactive R code blocks in their documents.</p> <h3 id="interactive-code-blocks">Interactive code blocks <a href="#interactive-code-blocks"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>One of my favourite demonstrations of what webR can do is interactive code blocks for R code. After a short loading period while the webR binary is downloaded, a <strong>Run code</strong> button will be enabled below. Using examples like this, R code can be immediately edited and executed &ndash; feel free to experiment! Click the &ldquo;Run code&rdquo; button to see the resulting box plot, change the colour from <code>mediumseagreen</code> to <code>red</code> and run the code again.</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-1">Loading webR...</button> <div id="webr-editor-1"></div> <div id="webr-code-output-1"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-1'); const outputDiv = document.getElementById('webr-code-output-1'); const editorDiv = document.getElementById('webr-editor-1'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `colnames(mtcars)\n\nboxplot(\n mpg ~ cyl, data = mtcars,\n col = "mediumseagreen",\n xlab = "Number of Cylinders",\n ylab = "Miles/(US) gallon",\n main = "Motor Trend Car Road Tests",\n sub = "Source: 1974 Motor Trend US magazine"\n)`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>It&rsquo;s easy to see the potential teaching benefit examples like this could bring to educational content or R package documentation.</p> <h2 id="the-webr-repl-app">The webR REPL app <a href="#the-webr-repl-app"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>WebR can be loaded into a web page to be used as a part of a wider web application, and ships with a demo application that does just that. The webR REPL app<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup> provides a simple R environment directly in your web browser. The app can be accessed at <a href="https://webr.r-wasm.org/v0.2.0/">https://webr.r-wasm.org/v0.2.0/</a> and includes sections for R console input/output, code editing, file management, and graphics device output.</p> <p>With the webR REPL app, a casual user could get up and running with R in seconds, without having to install any software on their machine. It is entirely feasible that they could perform the basics of data science entirely within their web browser!</p> <p>Other than interactive code blocks, like in the example earlier, the webR REPL app is perhaps the first thing that users new to webR will interact with. For this reason, we have spent some time working to improve the technical implementation and user experience of using the app. The app has been completely rewritten in the React web framework, replacing the older jQuery library. This allows for better component code organisation and more rapid development of features and updates.</p> <p><a href="repl.png"><img alt="A screenshot the webR REPL app. The code to generate a ggplot, along with its output, is shown in the app." width="95%" src="repl.png"></a></p> <h3 id="code-editor">Code editor <a href="#code-editor"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The app now comes with a tabbed code editor, allowing for easier editing and execution of R code. The editor integrates with the webR virtual filesystem (VFS), meaning that multiple R scripts can be opened, edited, and saved and they will be available to the running Wasm R process.</p> <p>The editor pane is built upon the excellent <a href="https://codemirror.net" target="_blank" rel="noopener">CodeMirror</a> text editor, which provides most of the component&rsquo;s functionality. CodeMirror provides built-in support for syntax highlighting of R code, which is enabled by default when R source files are displayed.</p> <p>The editor is integrated with the currently running R process and automatic code suggestions are shown as you type, provided by R&rsquo;s <a href="https://stat.ethz.ch/R-manual/R-devel/library/utils/html/rcompgen.html" target="_blank" rel="noopener">built in completion generator</a>. The suggestions are context sensitive and are aware of package and function names, valid arguments, and even objects that exist in the global environment.</p> <p><a href="completion.png"><img alt="A screenshot of the editor component showing code completion results. One of the suggestions is a data set available in the global environment." width="70%" src="completion.png"></a></p> <p>The running Wasm R process is also configured at initialisation to use the editor component as its display <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/file.show.html" target="_blank" rel="noopener">pager mechanism</a>. With this configuration in place running commands such as <a href="https://rdrr.io/r/stats/Normal.html" target="_blank" rel="noopener"><code>?rnorm</code></a> in the app automatically opens a new read-only tab in the editor displaying R&rsquo;s built-in documentation.</p> <p><a href="documentation.png"><img alt="A screenshot of the editor component showing built-in R documentation" width="80%" src="documentation.png"></a></p> <h3 id="plotting-pane">Plotting pane <a href="#plotting-pane"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The plotting pane has been updated to take advantage of improvements in webR&rsquo;s HTML canvas graphics device, set as the default device as part of initialisation. In particular, multiple plots are now supported and older plots can be directly accessed using the previous and next buttons in the plotting toolbar. You can try this out with R&rsquo;s built in graphics demo, by running <code>demo(graphics)</code> and/or <code>demo(persp)</code>.</p> <p><a href="plotting.png"><img alt="A screenshot of the plot pane showing a built-in R graphics demo" width="75%" src="plotting.png"></a></p> <h3 id="files-pane">Files pane <a href="#files-pane"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The files pane has been completely redesigned, removing its dependency on jQuery and instead making use of the <a href="https://www.npmjs.com/package/react-accessible-treeview" target="_blank" rel="noopener">react-accessible-treeview</a> package. As well as a technical improvement, this change means that interacting with the webR filesystem should be more usable to those with web accessibility requirements. We feel it&rsquo;s important that, where possible, everybody is able to use our software.</p> <p><a href="files.png"><img alt="A screenshot of the files pane showing the path /home/web_user/plot_random_numbers.R" width="90%" src="files.png"></a></p> <p>Additional buttons have also been added to this pane, allowing users to easily manipulate the virtual file system visible to the running Wasm R process. New files and directories can be created or deleted, and text-based files can be directly opened and modified in the editor pane, removing the need to download, edit and then re-upload files.</p> <h3 id="console-pane">Console pane <a href="#console-pane"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The R console component shown in the lower left portion of the app is powered by the wonderful <a href="https://xtermjs.org" target="_blank" rel="noopener">xterm.js</a> software, which provides a high performance terminal emulator on the web. R output looks at its best when running in this kind of environment, so that <a href="https://en.wikipedia.org/wiki/ANSI_escape_code" target="_blank" rel="noopener">ANSI escape codes</a> can be used to provide a much smoother console experience incorporating cursor placement, box drawing characters, bold text, terminal colours, and more.</p> <p><a href="term.png"><img alt="An example of ANSI escape sequences in R console output while loading the tidyverse package." width="90%" src="term.png"></a></p> <p>An optional accessibility mode is provided by xterm.js so that terminal output is readable by screen reader software, such as <a href="https://support.apple.com/en-gb/guide/voiceover/welcome/mac" target="_blank" rel="noopener">macOS&rsquo;s VoiceOver</a>. The webR REPL app now enables this mode by default to improve the accessibility of terminal output.</p> <h2 id="html-canvas-graphics-device">HTML Canvas graphics device <a href="#html-canvas-graphics-device"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The webR support package provides a custom <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> graphics device that renders output using the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API" target="_blank" rel="noopener">Web Canvas API</a>. When the graphics device is used, drawing commands from R are translated into Canvas API calls. The browser renders the graphics and the resulting image data is drawn to a HTML <code>&lt;canvas&gt;</code> element on the page.</p> <p>With the release of webR 0.2.0, we have improved the performance and added new features to the HTML canvas graphics device.</p> <h3 id="performance-improvements-with-offscreencanvas">Performance improvements with <code>OffscreenCanvas</code> <a href="#performance-improvements-with-offscreencanvas"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Using the Canvas API to draw graphics in a browser is elegant, but presents a problem. R is running via WebAssembly in a JavaScript <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers" target="_blank" rel="noopener">Web Worker</a> thread, but the <code>&lt;canvas&gt;</code> element the plot image data is written to is on the main thread, part of the web page <a href="https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model/Introduction" target="_blank" rel="noopener">DOM</a>. And, unfortunately, JavaScript Web Worker threads have no direct access to the DOM.</p> <p>Previous releases of webR solve this problem in a rather naive way, it simply sends the Canvas API calls to the main thread to be executed there. This leads to a few issues,</p> <ul> <li> <p>Canvas API calls are serialised as text to be sent to the main thread. Sufficiently complex plot text must therefore be quoted and escaped.</p> </li> <li> <p>Each API call is sent in a separate message. For a complex plot this can be thousands of messages to dispatch and handle.</p> </li> <li> <p>The messaging is one-way, results of useful methods like <a href="https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/measureText" target="_blank" rel="noopener"><code>measureText()</code></a> cannot easily be retrieved.</p> </li> <li> <p>Parsing and executing the API call on the main thread means using JavaScript&rsquo;s <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval" target="_blank" rel="noopener"><code>eval()</code></a> or <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function" target="_blank" rel="noopener"><code>Function()</code></a>, leading to poor performance. These functions should also be avoided when possible in any case, for security reasons.</p> </li> </ul> <p>Solid engineering efforts could be made to improve the situation, e.g. through batching API calls and better encoding, but there is a better way: the <a href="https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas" target="_blank" rel="noopener"><code>OffscreenCanvas</code></a> interface. <code>OffscreenCanvas</code> is designed to solve this exact problem of rendering graphics off-screen, such as in a worker thread. With <code>OffscreenCanvas</code> the Canvas API calls can all be executed on the worker thread, and only a single message containing the completed image data transferred to the main thread when rendering is complete. It is an efficient and technically satisfying solution, except that when webR 0.1.1 was released <code>OffscreenCanvas</code> wasn&rsquo;t supported by the Safari web browser.</p> <p>Today, on the other hand, <code>OffscreenCanvas</code> is <a href="https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas#browser_compatibility" target="_blank" rel="noopener">supported</a> in all major desktop and mobile browsers. Safari has supported it since version 16.4, and so with webR 0.2.0 we have rewritten the <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> graphics device to take full advantage of the <code>OffscreenCanvas</code> interface. This has led to a significant performance improvement, particularly when creating plots containing many points. The two videos below show the same plot rendered in webR 0.1.1 and 0.2.0, the difference is not just visible, but an order of magnitude faster.</p> <video controls loop width="100%" src="plot.mp4" style="border: 2px solid #CCC;"> <source src="plot.mp4"> </video> <div style="text-align: center; font-weight: bold;"> <p>A performance comparison plotting 300000 points in webR 0.1.1 and 0.2.0.</p> </div> <p>A potential downside is that users of less up-to-date browsers without <code>OffscreenCanvas</code> support won&rsquo;t be able to use the <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> graphics device. Such users should instead make use of our additional updates to webR to support the traditional Cairo-based bitmap devices. The <a href="#built-in-bitmap-graphics-devices">built-in graphics devices section</a> discusses that in more detail.</p> <h3 id="modern-text-rendering-and-internationalisation">Modern text rendering and internationalisation <a href="#modern-text-rendering-and-internationalisation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>With webR 0.1.1, the canvas graphics device had only minimal support for rendering text. The typeface was fixed, the font metrics were estimated with a heuristic, and Unicode characters outside the Basic Latin block often failed to render. It worked most of the time, but it was far from ideal. This area of software engineering is <a href="https://faultlore.com/blah/text-hates-you/" target="_blank" rel="noopener">suprisingly difficult</a> to get right, and even native installations of R can have <a href="https://www.tidyverse.org/blog/2021/02/modern-text-features/" target="_blank" rel="noopener">serious text rendering issues</a>.</p> <p>In comparison, web browser support for text rendering is excellent. Now that we use the <code>OffscreenCanvas</code> interface, we too can take advantage of the years of work behind browser&rsquo;s support for text on the web. The example below demonstrates several of the modern text rendering features now supported by <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a>.</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-2">Loading webR...</button> <div id="webr-editor-2"></div> <div id="webr-code-output-2"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-2'); const outputDiv = document.getElementById('webr-code-output-2'); const editorDiv = document.getElementById('webr-editor-2'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `plot(\n rnorm(1000), rnorm(1000),\n col = rgb(0, 0, 0, 0.5),\n xlim = c(-5, 5), ylim = c(-5, 5),\n main = "This is the title 🚀",\n xlab = "This is the x label",\n ylab = "This is the y label",\n family = "Futura"\n)\ntext(-3.5, 4, "This is English", family = "monospace")\ntext(-3.5, -4, "هذا مكتوب باللغة العربية")\ntext(3.5, 4, "これは日本語です")\ntext(3.5, -4, "זה כתוב בעברית")`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>Any system font available to the web browser can now be used<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>. As well as a nice-to-have, this also provides improved accessibility. For example, there are fonts designed specifically for use by readers with dyslexia and other similar reading barriers<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup> that could be used for drawing text in plots.</p> <p>Font metrics are now exact, using <a href="https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/measureText" target="_blank" rel="noopener"><code>measureText()</code></a>, rather than estimating the width and height of Latin glyphs using heuristics. This gives more accurate positioning of rendered text and improves the general quality of resulting plots.</p> <p>Support for Unicode, font glyph fallback, complex ligatures, and right-to-left (RTL) text have all been improved. This vastly improves results when rendering text for international users, particularly for non-Latin RTL scripts such as the Arabic and Hebrew text in the example above.</p> <p>Also, colour emoji can now be added to plots. 😃</p> <h3 id="paths-and-winding-rules">Paths and winding rules <a href="#paths-and-winding-rules"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Additional support for the drawing and filling of paths and polygons, including with different <a href="https://oreillymedia.github.io/Using_SVG/extras/ch06-fill-rule.html" target="_blank" rel="noopener">winding rules</a>, has been added to the webR canvas graphics device. An area where this new functionality makes a world of difference is plotting spatial features and maps. Previously broken R code for plotting maps with the <code>ggplot2</code> and <code>sf</code> packages now works well with webR 0.2.0.</p> <p><a href="paths.png"><img alt="A screenshot of R plotting code testing paths with winding settings and map plotting. Output on the left for webR 0.1.1 is broken. Output on the right for webR 0.2.0 works correctly" width="95%" src="paths.png"></a></p> <h3 id="output-messages-from-the-canvas-graphics-device">Output messages from the canvas graphics device <a href="#output-messages-from-the-canvas-graphics-device"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As a result of the changes to the HTML canvas graphics device, the structure of output messages communicated to the main thread has been redesigned. This is a breaking change and existing webR applications will need to be updated to listen for the new output messaging format.</p> <p> <a href="https://docs.r-wasm.org/webr/latest/plotting.html" target="_blank" rel="noopener">A Plotting section</a> has been added to the webR documentation describing how plotting works with the <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> device, and how to handle the output messages in your own web applications.</p> <p>A <code>'canvas'</code> type output message with an <code>event</code> property of <code>'canvasNewPage'</code> indicates the start of a new plot,</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="s1">&#39;canvas&#39;</span><span class="p">,</span> <span class="nx">data</span><span class="o">:</span> <span class="p">{</span> <span class="nx">event</span><span class="o">:</span> <span class="s1">&#39;canvasNewPage&#39;</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div><p>An output message with an <code>event</code> property of <code>'canvasImage'</code> indicates that there is some graphics data ready to be drawn,</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="s1">&#39;canvas&#39;</span><span class="p">,</span> <span class="nx">data</span><span class="o">:</span> <span class="p">{</span> <span class="nx">event</span><span class="o">:</span> <span class="s1">&#39;canvasImage&#39;</span><span class="p">,</span> <span class="nx">image</span>: <span class="kt">ImageBitmap</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div><p>The <code>image</code> property in the message data contains a JavaScript <a href="https://developer.mozilla.org/en-US/docs/Web/API/ImageBitmap" target="_blank" rel="noopener"><code>ImageBitmap</code></a> object. This can be drawn to a HTML <code>&lt;canvas&gt;</code> element using the <a href="https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/drawImage" target="_blank" rel="noopener"><code>drawImage()</code></a> method.</p> <h2 id="built-in-bitmap-graphics-devices">Built-in bitmap graphics devices <a href="#built-in-bitmap-graphics-devices"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Not all environments where webR could be running support plotting to a HTML <code>&lt;canvas&gt;</code> element. Older browsers may not support the required <code>OffscreenCanvas</code> interface, webR might be running server-side in Node.js, or webR might be running more traditional R code or packages that are unaware of the <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> graphics device.</p> <p>For supporting these use cases, with webR 0.2.0 the built-in bitmap graphics devices are now able to be used, writing their output to the webR VFS. This includes the <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>png()</code></a>, <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>bmp()</code></a>, <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>jpeg()</code></a>, <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>tiff()</code></a> devices, and potentially others implemented using the Cairo graphics library.</p> <p>In the example below, webR is loaded into a JavaScript environment and plotting is done using the built-in <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>png()</code></a> graphics device. The resulting image is written to the virtual filesystem and its contents can then be obtained using webR&rsquo;s <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/WebR.WebR.html#fs" target="_blank" rel="noopener"><code>FS</code></a> interface, designed to be similar to <a href="https://emscripten.org/docs/api_reference/Filesystem-API.html" target="_blank" rel="noopener">Emscripten&rsquo;s filesystem API</a>.</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">WebR</span> <span class="p">}</span> <span class="nx">from</span> <span class="s1">&#39;webr&#39;</span><span class="p">;</span> <span class="kr">const</span> <span class="nx">webR</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">WebR</span><span class="p">();</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">init</span><span class="p">();</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalRVoid</span><span class="p">(</span><span class="sb">` </span><span class="sb"> png(&#39;/tmp/Rplot.png&#39;, width = 800, height = 800, res = 144) </span><span class="sb"> hist(rnorm(1000)) </span><span class="sb"> dev.off() </span><span class="sb">`</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">plotImageData</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">FS</span><span class="p">.</span><span class="nx">readFile</span><span class="p">(</span><span class="s1">&#39;/tmp/Rplot.png&#39;</span><span class="p">);</span> </code></pre></div><p>The image data is contained in the <code>plotImageData</code> variable as a JavaScript <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array" target="_blank" rel="noopener"><code>UInt8Array</code></a>. Once obtained from the VFS, the image can be served to the end user as a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Blob" target="_blank" rel="noopener"><code>Blob</code></a> file download, displayed on a web page, or if running webR server-side returned over the network.</p> <h3 id="text-rendering-and-font-support">Text rendering and font support <a href="#text-rendering-and-font-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As with the <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a> improvements described in the previous section, we feel it is important that the built in R graphics devices provides a high level of support for text rendering in webR. Here, however, the approach is different. The built-in graphics devices renders image data entirely within the WebAssembly environment, so we can no longer rely on the web browser for high quality text!</p> <p>The built-in graphics devices are powered by the Cairo graphics library, which can now optionally be compiled for Wasm as part of the webR build process. In addition, when enabled various other libraries are compiled for Wasm to improve the quality of text rendering in Cairo,</p> <ul> <li> <a href="https://pango.gnome.org" target="_blank" rel="noopener">pango</a></li> <li> <a href="http://fribidi.org" target="_blank" rel="noopener">fribidi</a></li> <li> <a href="https://harfbuzz.github.io" target="_blank" rel="noopener">harfbuzz</a></li> <li> <a href="https://freetype.org" target="_blank" rel="noopener">freetype</a></li> <li> <a href="https://www.freedesktop.org/wiki/Software/fontconfig/" target="_blank" rel="noopener">fontconfig</a></li> </ul> <p>Public releases of webR distributed via GitHub and CDN will be built with these libraries all enabled and included.</p> <h4 id="font-files-on-the-vfs">Font files on the VFS <a href="#font-files-on-the-vfs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>When plotting with the built-in bitmap graphics devices, fonts must be accessible to the Cairo library through the webR VFS. A minimal selection of <a href="https://fonts.google.com/noto" target="_blank" rel="noopener">Google&rsquo;s Noto fonts</a> are bundled with webR when Cairo graphics is enabled.</p> <p>The fontconfig library is also configured to search the VFS directory <code>/home/web_user/fonts</code> for additional fonts. Users who wish to use custom fonts, or alternative writing systems, may do so by uploading font files to this directory. In the case of international scripts or non-Latin Unicode such as emoji, fontconfig will automatically use font fallback to select reasonable fonts containing the required glyphs.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">png</span><span class="p">(</span><span class="n">width</span> <span class="o">=</span> <span class="m">1200</span><span class="p">,</span> <span class="n">height</span> <span class="o">=</span> <span class="m">800</span><span class="p">,</span> <span class="n">res</span> <span class="o">=</span> <span class="m">180</span><span class="p">)</span> <span class="nf">plot</span><span class="p">(</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">1000</span><span class="p">),</span> <span class="n">col</span> <span class="o">=</span> <span class="nf">rgb</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">0.5</span><span class="p">),</span> <span class="n">xlim</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">-5</span><span class="p">,</span> <span class="m">5</span><span class="p">),</span> <span class="n">ylim</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">-5</span><span class="p">,</span> <span class="m">5</span><span class="p">),</span> <span class="n">main</span> <span class="o">=</span> <span class="s">&#34;This is the title 🚀&#34;</span><span class="p">,</span> <span class="n">xlab</span> <span class="o">=</span> <span class="s">&#34;This is the x label&#34;</span><span class="p">,</span> <span class="n">ylab</span> <span class="o">=</span> <span class="s">&#34;This is the y label&#34;</span> <span class="p">)</span> <span class="nf">text</span><span class="p">(</span><span class="m">-3.5</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="s">&#34;This is English&#34;</span><span class="p">)</span> <span class="nf">text</span><span class="p">(</span><span class="m">-3.5</span><span class="p">,</span> <span class="m">-4</span><span class="p">,</span> <span class="s">&#34;هذا مكتوب باللغة العربية&#34;</span><span class="p">)</span> <span class="nf">text</span><span class="p">(</span><span class="m">3.5</span><span class="p">,</span> <span class="m">4</span><span class="p">,</span> <span class="s">&#34;これは日本語です&#34;</span><span class="p">)</span> <span class="nf">text</span><span class="p">(</span><span class="m">3.5</span><span class="p">,</span> <span class="m">-4</span><span class="p">,</span> <span class="s">&#34;זה כתוב בעברית&#34;</span><span class="p">)</span> <span class="nf">dev.off</span><span class="p">()</span> </code></pre></div><p>This is essentially the same example as in the previous section, demonstrating a selection of advanced font functionality. In this example we are rendering a PNG file using the built-in <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>png()</code></a> graphics device. We can see that by uploading appropriate fonts to the VFS, the same set of advanced text rendering features that are provided by the browser can also be used with R&rsquo;s built-in bitmap graphics devices.</p> <p><a href="textplot.png"><img alt="A screenshot showing the output of the above plotting code is shown on the left. The additional fonts uploaded to the VFS are listed on the right." width="100%" src="textplot.png"></a></p> <h2 id="lazy-virtual-filesystem">Lazy virtual filesystem <a href="#lazy-virtual-filesystem"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>All of the additional features I&rsquo;ve written about so far come with a price: increased Wasm binary and data download size. Consider the fonts in the previous section - each font file bundled with webR is going to increase the total size of the default webR filesystem by around 500KB.</p> <p>This is a high price to pay in time and bandwidth when not every user is going to need every feature. A similar principle also applies to other files included with R by default. It&rsquo;s nice that all the default R documentation, examples, and datasets are available on the VFS, but we don&rsquo;t necessarily need those files downloaded every time to every client machine.</p> <p>With webR 0.2.0 a &ldquo;lazy&rdquo; virtual filesystem mechanism, powered by <a href="https://emscripten.org/docs/porting/files/Synchronous-Virtual-XHR-Backed-File-System-Usage.html" target="_blank" rel="noopener">a feature of Emscripten&rsquo;s FS API</a>, is introduced. With this, only the files required to launch R and use the default packages are downloaded at initialisation time. Additional files provided on the VFS are still available for use, but they are only downloaded from the remote server when they are requested in some way by the running Wasm R process.</p> <p>With the introduction of the lazy virtual filesystem, along with other efficiency improvements, the initial download size for webR is now much smaller, a great improvement.</p> <table> <thead> <tr> <th>Component</th> <th>0.1.1</th> <th>0.2.0</th> <th>(% of previous)</th> </tr> </thead> <tbody> <tr> <td><code>R.bin.data</code></td> <td>25.3MB</td> <td>5.2MB</td> <td>20.6%</td> </tr> <tr> <td><code>R.bin.wasm</code></td> <td>12.8MB</td> <td>1.7MB</td> <td>7.5%</td> </tr> <tr> <td><strong>Total for the webR REPL app</strong></td> <td>40.2MB</td> <td>9.5MB</td> <td>23.6%</td> </tr> </tbody> </table> <h2 id="r-packages">R packages <a href="#r-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Since initial release, webR has supported loading R packages by first installing them to the Emscripten VFS using the helper function <a href="https://docs.r-wasm.org/webr/latest/api/r.html#install-one-or-more-packages-from-a-webr-binary-package-repo" target="_blank" rel="noopener"><code>webr::install()</code></a> or by manually placing R packages in the VFS at <code>/usr/lib/R/library</code>. We find that pure R packages usually work well, but R packages with underlying C (or Fortran, or otherwise&hellip;) code must be compiled from source for Wasm.</p> <p>We host a public CRAN-like R package repository containing packages built for Wasm in this way, so that there exists a subset of useful and supported R packages that can be used with webR. The public repository is hosted at <a href="https://repo.r-wasm.org">https://repo.r-wasm.org</a> and this repo URL is used by default when running <a href="https://docs.r-wasm.org/webr/latest/api/r.html#install-one-or-more-packages-from-a-webr-binary-package-repo" target="_blank" rel="noopener"><code>webr::install()</code></a> to install a Wasm R package.</p> <p>It remains the case that building custom R packages for Wasm is not well documented, but we do hope to improve the situation over time as our package build infrastructure develops and matures. In the future, we plan to provide a Wasm R package build system as a set of Docker containers, so that users are able to build their own packages for webR using a container environment.</p> <h3 id="webassembly-system-libraries-for-r-packages">WebAssembly system libraries for R packages <a href="#webassembly-system-libraries-for-r-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Many R packages require linking with system libraries to build and run. When building such R packages for WebAssembly, not only does the package code require compiling for Wasm, but also any system libraries that code depends on.</p> <p>To expand support for R packages, webR 0.2.0 ships with <a href="https://github.com/r-wasm/webr/tree/main/libs/recipes" target="_blank" rel="noopener">additional recipes</a> to build system libraries from source for Wasm. The libraries consist of a selection of utility, database, graphics, text rendering, geometry, and geospatial support packages, with specific libraries chosen for their possibility to be compiled for Wasm as well as the number of R packages relying on them. I expect that the number of system libraries supported will continue to grow over time as we attempt to build more R packages for Wasm.</p> <p>As of webR 0.1.1, <strong>219</strong> packages were available to install through our public Wasm R package repo. With the release of webR 0.2.0 and its additional system libraries, the number of available packages is now <strong>10324</strong> (approximately 51% of CRAN packages). Though, it should be noted that these packages have not been tested in detail. Here, &ldquo;available&rdquo; just means that the Emscripten compiler successfully built the R package for Wasm, along with its prerequisite packages.</p> <h3 id="public-wasm-r-packages-dashboard">Public Wasm R packages dashboard <a href="#public-wasm-r-packages-dashboard"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>While available R packages can be listed using <a href="https://rdrr.io/r/utils/available.packages.html" target="_blank" rel="noopener"><code>available.packages()</code></a> with our CRAN-like Wasm R package repo, it&rsquo;s not the smoothest experience for users simply wanting to check if a given package is available. A dashboard has been added to the <a href="https://repo.r-wasm.org" target="_blank" rel="noopener">repo index page</a> which lists the available packages compiled for Wasm in an interactive table. The table also lists package dependencies, noting which prerequisite packages, if any, are still missing.</p> <p><a href="repo.png"><img alt="A screenshot of the webR binary R package repository index page. A table of available R packages is shown, along with their prerequisites" width="95%" src="repo.png"></a></p> <p>It might be interesting to note that this dashboard itself is running under webR, through a fully client-side Shiny app.</p> <h2 id="running-httpuv--shiny-under-webr">Running httpuv &amp; Shiny under webR <a href="#running-httpuv--shiny-under-webr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Using features new to webR 0.2.0, a <a href="https://github.com/r-wasm/httpuv" target="_blank" rel="noopener">httpuv webR package shim</a> has been created that provides the functionality usually provided by the <a href="https://cran.r-project.org/web/packages/httpuv/index.html" target="_blank" rel="noopener">httpuv</a> R package. The package enables R to handle HTTP and WebSocket traffic, and is a prerequisite for the R Shiny package.</p> <p>The shim works by taking advantage of the <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API" target="_blank" rel="noopener">JavaScript Service Worker API</a>. Normally Service Workers are used to implement fast offline caching of web content, but they can also be used as a general network proxy. The httpuv shim makes use of a Service Worker to intercept network traffic from a running Shiny web client, and forward that traffic to be handled by an instance of webR.</p> <p>From the Shiny server&rsquo;s point of view, it is communicating with the usual httpuv package using its R API. From the point of view of the Shiny web client, it is talking to a Shiny server over the network. Between the two, the JavaScript Service Worker and webR work together to act as a network proxy and handle the traffic entirely within the client<sup id="fnref:6"><a href="#fn:6" class="footnote-ref" role="doc-noteref">6</a></sup>.</p> <p><a href="httpuv.png"><img alt="A block diagram showing how the httpuv shim, webR worker thread, and Shiny work together. See the preceding diagram for an explanation of how the blocks interact" width="90%" src="httpuv.png"></a></p> <p>The httpuv shim package is still in the experimental stage, but it is currently available for testing and is included in our public webR package repository.</p> <h3 id="an-example-shiny-app">An example shiny app <a href="#an-example-shiny-app"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p><a href="shiny.png"><img alt="A screenshot of the webR Shiny demo. The shiny app is shown in the top section of the screenshot, an input slider and an output histogram plot. The lower section shows the normally server-side Shiny package console output when tracing is enabled." width="90%" src="shiny.png"></a></p> <p>An example Shiny app, making use of the httpuv shim and running fully client-side, is available at <a href="https://shiny-standalone-webr-demo.netlify.app">https://shiny-standalone-webr-demo.netlify.app</a>.</p> <p>Once the app has loaded in your browser, it&rsquo;s possible to confirm that the app is running entirely client-side by observing the Shiny server trace output at the bottom of the screen. You should even be able to disconnect completely from the internet and continue to use the app offline.</p> <p>The source code for the demo, which includes some information describing how to set up a webR Shiny server in this way, can be found at <a href="https://github.com/georgestagg/shiny-standalone-webr-demo">georgestagg/shiny-standalone-webr-demo</a>. Note that this repository is targeted towards advanced web developers with prior experience of development with JavaScript Web Workers. It is intended as a demonstration of the technology, rather than a tutorial.</p> <p>A coming-soon version of Shinylive for R will provide a much better user experience for getting fully client-side R Shiny apps up and running, without requiring advanced knowledge of JavaScript&rsquo;s Worker API. I believe Shinylive with webR integration will pave the way for providing a user-friendly method to build and deploy containerised R Shiny apps, running on WebAssembly.</p> <h2 id="changes-to-the-webr-developer-api">Changes to the webR developer API <a href="#changes-to-the-webr-developer-api"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>It&rsquo;s possible for webR to be used in isolation, but it&rsquo;s likely that developers will want to interface webR with other JavaScript frameworks and tools. The dynamism and interconnectivity of the web is one of its great strengths, and we&rsquo;d like the same to be true of webR. This section describes changes to webR&rsquo;s developer API, used to interact with the running R session from the JavaScript environment.</p> <h3 id="performance-improvements-with-messagepack-protocol">Performance improvements with MessagePack protocol <a href="#performance-improvements-with-messagepack-protocol"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When working to integrate webR into a wider application, at some point we will need to move data into the running R process, and later return results back to JavaScript. It&rsquo;s possible to move data into R by evaluating R code directly, but the webR library also provides <a href="https://docs.r-wasm.org/webr/latest/convert-js-to-r.html" target="_blank" rel="noopener">other ways to transfer raw data to R</a>.</p> <p>Consider the example below. Data is transferred from JavaScript into the running R process by binding <code>jsData</code> to an R variable in the global environment using <a href="https://docs.r-wasm.org/webr/latest/convert-js-to-r.html#binding-objects-to-an-r-environment" target="_blank" rel="noopener"><code>webR.objs.globalEnv.bind()</code></a>. Next, some computation on the data is done, represented as evaluating the <code>do_analysis()</code> R function. Finally the result is returned back to JavaScript, first as a reference to an R object and then transferring the result data back to the JavaScript environment using <a href="https://docs.r-wasm.org/webr/latest/convert-r-to-js.html#serialising-r-objects" target="_blank" rel="noopener"><code>toJs()</code></a>.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">const</span> <span class="nx">jsData</span> <span class="o">=</span> <span class="p">[...</span> <span class="nx">some</span> <span class="nx">large</span> <span class="nx">JavaScript</span> <span class="nx">dataset</span> <span class="p">...];</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">objs</span><span class="p">.</span><span class="nx">globalEnv</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span> <span class="nx">jsData</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">ret</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s2">&#34;do_analysis(data)&#34;</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">ret</span><span class="p">.</span><span class="nx">toJs</span><span class="p">();</span> </code></pre></div><p>It&rsquo;s easy to see how this workflow could be useful as part of a wider application, enabling a complex data manipulation or a statistical modelling in R that would otherwise be awkward to perform directly in JavaScript.</p> <p>Behind the scenes, we&rsquo;ve done work to ensure that data is transferred efficiently to and from the R environment, and in webR 0.2.0 the <a href="https://msgpack.org/index.html" target="_blank" rel="noopener">MessagePack</a> protocol is now used as the main way that data is serialised and transferred, replacing JSON encoding.</p> <p>This change provides a significant performance improvement. <a href="https://github.com/r-wasm/webr/pull/204" target="_blank" rel="noopener">Initial testing</a> shows an order of magnitude speed boost when transferring large sets of data from the JavaScript environment into R. Thanks to <a href="https://github.com/r-wasm/webr/issues/203" target="_blank" rel="noopener">@jeroen</a> for prompting me to look into it!</p> <h3 id="the-typing-of-r-object-references">The typing of R object references <a href="#the-typing-of-r-object-references"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When working with webR in TypeScript it is important to keep track of R object types. All references to R objects are instances of the <a href="https://docs.r-wasm.org/webr/latest/objects.html" target="_blank" rel="noopener"><code>RObject</code></a> class, and various subclasses implement specific features for each fundamental R data type.</p> <p>In this example, an <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/RWorker.RDouble.html" target="_blank" rel="noopener"><code>RDouble</code></a> object is returned at runtime, but <code>webR.evalR()</code> is typed to return a generic <code>RObject</code>. Notice that the <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/RWorker.RDouble.html#tonumber" target="_blank" rel="noopener"><code>.toNumber()</code></a> method exists on <code>RDouble</code>, but not on the <code>RObject</code> superclass. So while this example runs with no problem once compiled to JavaScript, it gives an error under TypeScript!</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="kr">const</span> <span class="nx">obj</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s1">&#39;1.23456&#39;</span><span class="p">);</span> <span class="kr">const</span> <span class="nx">data</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">toJs</span><span class="p">();</span> <span class="kr">const</span> <span class="nx">num</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">toNumber</span><span class="p">();</span> <span class="c1">// An error under TypeScript! </span></code></pre></div><p>One solution is to use the <a href="https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#type-assertions" target="_blank" rel="noopener"><code>as</code></a> keyword to assert a specific type of <code>RObject</code> subclass. Alternatively, webR also provides <a href="https://docs.r-wasm.org/webr/latest/evaluating.html#returning-javascript-values-when-evaluating-r-code" target="_blank" rel="noopener">variants of the <code>evalR()</code> function</a> that return and convert results to a specific type of JavaScript object.</p> <p>In many cases these methods will work well, <em>but they require you to know for sure what type of R object has been returned</em>. Additional support has been added in webR 0.2.0 to better handle typing when it is not entirely clear what type of <code>RObject</code> you have.</p> <h4 id="type-predicate-functions">Type predicate functions <a href="#type-predicate-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>TypeScript supports a kind of return type known as a <a href="https://www.typescriptlang.org/docs/handbook/2/narrowing.html#using-type-predicates" target="_blank" rel="noopener">type predicate</a>. These return types can be used to create user-defined type guards, functions that take an object argument and return a boolean indicating if the object is of a compatible type. With this, TypeScript is able to automatically <a href="https://www.typescriptlang.org/docs/handbook/2/narrowing.html" target="_blank" rel="noopener">narrow</a> types based on the return value from the type predicate function.</p> <p>WebR 0.2.0 ships with a selection of <a href="https://docs.r-wasm.org/webr/latest/objects.html#type-predicate-functions" target="_blank" rel="noopener">type predicate functions for each fundamental R data type</a> supported by webR. In the following example, the TypeScript error described above is dealt with by using the function <a href="https://docs.r-wasm.org/webr/latest/api/js/modules/RMain.html#isrdouble" target="_blank" rel="noopener"><code>isRDouble()</code></a>. Inside the branch, TypeScript narrows the object type to an <code>RDouble</code>, resolving the issue.</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">isRDouble</span> <span class="p">}</span> <span class="nx">from</span> <span class="s1">&#39;webr&#39;</span><span class="p">;</span> <span class="kr">const</span> <span class="nx">obj</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s1">&#39;1.23456&#39;</span><span class="p">);</span> <span class="k">try</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">isRDouble</span><span class="p">(</span><span class="nx">obj</span><span class="p">))</span> <span class="p">{</span> <span class="c1">// In this branch, TypeScript narrows the type of `obj` to an `RDouble` </span><span class="c1"></span> <span class="kr">const</span> <span class="nx">num</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">obj</span><span class="p">.</span><span class="nx">toNumber</span><span class="p">();</span> <span class="c1">// Do something with `num` ... </span><span class="c1"></span> <span class="p">}</span> <span class="p">}</span> <span class="k">finally</span> <span class="p">{</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">destroy</span><span class="p">(</span><span class="nx">obj</span><span class="p">);</span> <span class="p">}</span> </code></pre></div> <h3 id="handling-errors-with-webrerror">Handling errors with <code>WebRError</code> <a href="#handling-errors-with-webrerror"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When executing R code with webR&rsquo;s <code>evalR()</code> family of functions, by default any error condition from R is converted into a JavaScript <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Error" target="_blank" rel="noopener"><code>Error</code></a> and thrown. This feature can be very useful, because it allows developers to catch issues while executing R code in the native JavaScript environment.</p> <p>However, consider the following example,</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="k">try</span> <span class="p">{</span> <span class="kr">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s1">&#39;some_R_code()&#39;</span><span class="p">);</span> <span class="nx">doSomethingWith</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span> <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// Handle some error that occured </span><span class="c1"></span><span class="p">}</span> </code></pre></div><p>If an error is thrown, how can we tell if the error came from R or from some issue inside the JavaScript function? Nested <code>try</code>/<code>catch</code> could be used, but this becomes unwieldy quickly. Parsing the error message text is another option, though not so elegant.</p> <p>With webR 0.2.0 any errors that occur in R code executed using <code>evalR()</code>, or any internal webR issues, are thrown as instances of <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/WebR.WebRError.html" target="_blank" rel="noopener"><code>WebRError</code></a>. With this change, the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/instanceof" target="_blank" rel="noopener"><code>instanceof</code></a> keyword can be used to differentiate between errors occurring in R, and errors in JavaScript code.</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">WebRError</span> <span class="p">}</span> <span class="nx">from</span> <span class="s1">&#39;webR&#39;</span><span class="p">;</span> <span class="k">try</span> <span class="p">{</span> <span class="kr">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s1">&#39;some_R_code()&#39;</span><span class="p">);</span> <span class="nx">doSomethingWith</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span> <span class="p">}</span> <span class="k">catch</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="nx">e</span> <span class="k">instanceof</span> <span class="nx">WebRError</span><span class="p">)</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="s2">&#34;An error occured executing R code&#34;</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="nx">console</span><span class="p">.</span><span class="nx">error</span><span class="p">(</span><span class="s2">&#34;An error occured in JavaScript&#34;</span><span class="p">);</span> <span class="p">}</span> <span class="k">throw</span> <span class="nx">e</span><span class="p">;</span> <span class="p">}</span> </code></pre></div> <h3 id="safely-handling-webr-termination">Safely handling webR termination <a href="#safely-handling-webr-termination"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Consider the following <code>async</code> loop, a useful pattern to continuously handle webR output messages,</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="nx">async</span> <span class="kd">function</span> <span class="nx">run() {</span> <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span> <span class="kr">const</span> <span class="nx">output</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">read</span><span class="p">();</span> <span class="k">switch</span> <span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">type</span><span class="p">)</span> <span class="p">{</span> <span class="k">case</span> <span class="s1">&#39;stdout&#39;</span><span class="o">:</span> <span class="k">case</span> <span class="s1">&#39;stderr&#39;</span><span class="o">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="k">default</span><span class="o">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">warn</span><span class="p">(</span><span class="sb">`Unhandled output type: </span><span class="si">${</span><span class="nx">output</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb">.`</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div><p>Here <code>await webR.read()</code> waits asynchronously for output messages from webR&rsquo;s communication channel. For example, a running R process might print results between long computational delays. Such occasional printed output might be received as messages with a <code>type</code> property of <code>'stdout'</code>.</p> <p>After a message is received, it is handled in a <code>switch</code> statement and then the loop continues around to wait for another output message. This works well while webR is running, but what happens when terminated with <a href="https://docs.r-wasm.org/webr/latest/api/js/classes/WebR.WebR.html#close" target="_blank" rel="noopener"><code>webR.close()</code></a>? The R worker thread is stopped and destroyed, but the loop continues to wait for a message that will never come.</p> <p>With webR 0.2.0 a new type of message is issued when webR is terminated using <code>webR.close()</code>. After the webR worker thread has been destroyed, a message is emitted on the usual output channel with a <code>type</code> property of <code>'closed'</code>, with no associated <code>data</code> property. The implication is that once this message has been emitted, that particular instance of webR has terminated and the the async loop is no longer needed.</p> <p>With this change, exiting the loop once webR has terminated could be as simple as adding an extra <code>case</code> statement,</p> <div class="highlight"><pre class="chroma"><code class="language-typescript" data-lang="typescript"><span class="nx">async</span> <span class="kd">function</span> <span class="nx">run() {</span> <span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span> <span class="kr">const</span> <span class="nx">output</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">read</span><span class="p">();</span> <span class="k">switch</span> <span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">type</span><span class="p">)</span> <span class="p">{</span> <span class="k">case</span> <span class="s1">&#39;stdout&#39;</span><span class="o">:</span> <span class="k">case</span> <span class="s1">&#39;stderr&#39;</span><span class="o">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">output</span><span class="p">.</span><span class="nx">data</span><span class="p">);</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="s1">&#39;closed&#39;</span><span class="o">:</span> <span class="k">return</span><span class="p">;</span> <span class="k">default</span><span class="o">:</span> <span class="nx">console</span><span class="p">.</span><span class="nx">warn</span><span class="p">(</span><span class="sb">`Unhandled output type: </span><span class="si">${</span><span class="nx">output</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb">.`</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div> <h2 id="installation-and-next-steps">Installation and next steps <a href="#installation-and-next-steps"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Developers can integrate webR in their own JavaScript or TypeScript projects by installing the <a href="https://www.npmjs.com/package/webr" target="_blank" rel="noopener">webR npm package</a>, or by directly importing webR from CDN. Issues and PRs are accepted and welcome on the main <a href="https://github.com/r-wasm/webr" target="_blank" rel="noopener">r-wasm/webr</a> GitHub repository.</p> <h3 id="npm">npm <a href="#npm"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>With this release, the webR npm package name has been updated, simplified from the original <code>@r-wasm/webr</code> package name to simply <code>webr</code>.</p> <div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">npm i webr </code></pre></div><p>The original namespaced package <code>@r-wasm/webr</code> will be deprecated, and from v0.2.0 onwards npm will display a message pointing to the new package name.</p> <h3 id="cdn-url">CDN URL <a href="#cdn-url"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Alternatively, webR can be imported directly as a module from CDN.</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">WebR</span> <span class="p">}</span> <span class="nx">from</span> <span class="s2">&#34;https://webr.r-wasm.org/v0.2.0/webr.mjs&#34;</span> </code></pre></div> <h3 id="binary-release-packages">Binary release packages <a href="#binary-release-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Finally, binary webR packages can be downloaded from GitHub on the releases page of the <a href="https://github.com/r-wasm/webr" target="_blank" rel="noopener">r-wasm/webr</a> repo.</p> <h3 id="documentation">Documentation <a href="#documentation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The next step of integrating webR into your own software should be to visit the documentation pages, provided at <a href="https://docs.r-wasm.org/webr/v0.2.0/">https://docs.r-wasm.org/webr/v0.2.0/</a>. My previous <a href="https://www.tidyverse.org/blog/2023/03/webr-0-1-0/" target="_blank" rel="noopener">webR release blog post</a> also briefly explains how to get started, though the docs go into much more detail.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thank you to all of webR&rsquo;s early adopters, experimenting with the system and providing feedback in the form of GitHub Issues and PRs.</p> <p> <a href="https://github.com/Anurodhyadav" target="_blank" rel="noopener">@Anurodhyadav</a>, <a href="https://github.com/arkraieski" target="_blank" rel="noopener">@arkraieski</a>, <a href="https://github.com/averissimo" target="_blank" rel="noopener">@averissimo</a>, <a href="https://github.com/awconway" target="_blank" rel="noopener">@awconway</a>, <a href="https://github.com/bahadzie" target="_blank" rel="noopener">@bahadzie</a>, <a href="https://github.com/ceciliacsilva" target="_blank" rel="noopener">@ceciliacsilva</a>, <a href="https://github.com/DanielEWeeks" target="_blank" rel="noopener">@DanielEWeeks</a>, <a href="https://github.com/eteitelbaum" target="_blank" rel="noopener">@eteitelbaum</a>, <a href="https://github.com/fortunewalla" target="_blank" rel="noopener">@fortunewalla</a>, <a href="https://github.com/gedw99" target="_blank" rel="noopener">@gedw99</a>, <a href="https://github.com/gwd-at" target="_blank" rel="noopener">@gwd-at</a>, <a href="https://github.com/hatemhosny" target="_blank" rel="noopener">@hatemhosny</a>, <a href="https://github.com/hrbrmstr" target="_blank" rel="noopener">@hrbrmstr</a>, <a href="https://github.com/ivelasq" target="_blank" rel="noopener">@ivelasq</a>, <a href="https://github.com/JeremyPasco" target="_blank" rel="noopener">@JeremyPasco</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/jooyoungseo" target="_blank" rel="noopener">@jooyoungseo</a>, <a href="https://github.com/jpjais" target="_blank" rel="noopener">@jpjais</a>, <a href="https://github.com/kforner" target="_blank" rel="noopener">@kforner</a>, <a href="https://github.com/lauritowal" target="_blank" rel="noopener">@lauritowal</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/matthiasbirkich" target="_blank" rel="noopener">@matthiasbirkich</a>, <a href="https://github.com/neocarto" target="_blank" rel="noopener">@neocarto</a>, <a href="https://github.com/noamross" target="_blank" rel="noopener">@noamross</a>, <a href="https://github.com/Polkas" target="_blank" rel="noopener">@Polkas</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/ries9112" target="_blank" rel="noopener">@ries9112</a>, <a href="https://github.com/SugarRayLua" target="_blank" rel="noopener">@SugarRayLua</a>, <a href="https://github.com/timelyportfolio" target="_blank" rel="noopener">@timelyportfolio</a>, <a href="https://github.com/WebReflection" target="_blank" rel="noopener">@WebReflection</a>, and <a href="https://github.com/WillemSleegers" target="_blank" rel="noopener">@WillemSleegers</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>In addition, <a href="https://blog.djnavarro.net/posts/2023-04-09_webr/" target="_blank" rel="noopener">Danielle Navarro&rsquo;s webR blog post</a> is very good and Bob Rudis&rsquo;s <a href="https://rud.is/webr-experiments/" target="_blank" rel="noopener">webR experiments</a> are well worth exploring, along with his recent <a href="https://youtu.be/inpwcTUmBDY" target="_blank" rel="noopener">NY R conference talk</a>. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Also other JavaScript/Wasm environments, such as Node.js. For example, <a href="https://ropensci.org/r-universe/" target="_blank" rel="noopener">ROpenSci&rsquo;s r-universe</a> package platform provides download links for datasets contained in R packages, in a variety of formats, <a href="https://fosstodon.org/@jeroenooms/110299179903212170" target="_blank" rel="noopener">powered by running webR server-side in Node.js</a>. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>REPL stands for &ldquo;Read, Eval, Print, Loop&rdquo;, and is another name for the R console that you&rsquo;re probably familiar with. The application is named the &ldquo;webR REPL app&rdquo; because the original version simply provided the user with a fullscreen R console in their web browser. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>This also includes the world of CSS web fonts, but it is a little tricky. <a href="https://stackoverflow.com/a/53808942" target="_blank" rel="noopener">Extra work</a> must be done so that the font is available to the Web Worker. Probably this can be handled better in a future release of <a href="https://docs.r-wasm.org/webr/latest/api/r.html#graphics-device-for-drawing-to-a-html-canvas-element" target="_blank" rel="noopener"><code>webr::canvas()</code></a>. <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:5" role="doc-endnote"> <p> <a href="https://www.dyslexiefont.com" target="_blank" rel="noopener">Dyslexie</a>, <a href="https://opendyslexic.org" target="_blank" rel="noopener">Open Dyslexic</a>. Results of research in this area is mixed, but even if these fonts don&rsquo;t improve the speed of text comprehension, some users may simply prefer or feel more comfortable with them. <a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:6" role="doc-endnote"> <p> <a href="https://shiny.posit.co/py/docs/shinylive.html" target="_blank" rel="noopener">Shinylive for Python</a> also uses a JavaScript Service Worker scheme to serve fully client-side apps. <a href="#fnref:6" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Teaching the tidyverse in 2023 https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23/ Mon, 07 Aug 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/08/teach-tidyverse-23/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) -- not applicable --> <p>Another year, another roundup of tidyverse updates, through the lens of an educator. As with previous <a href="https://www.tidyverse.org/blog/2021/08/teach-tidyverse-2021/">teaching the tidyverse posts</a>, much of what is discussed in this blog post has already been covered in package update posts, however the goal of this roundup is to summarize the highlights that are most relevant to teaching data science with the tidyverse, particularly to new learners.</p> <p>Specifically, I&rsquo;ll discuss:</p> <ul> <li> <a href="#resource-refresh">Resource refresh</a></li> <li> <a href="#nine-core-packages-in-tidyverse-200">Nine core packages in tidyverse 2.0.0</a></li> <li> <a href="#conflict-resolution-in-the-tidyverse">Conflict resolution in the tidyverse</a></li> <li> <a href="#improved-and-expanded-_join-functionality">Improved and expanded <code>*_join()</code> functionality</a></li> <li> <a href="#per-operation-grouping">Per operation grouping</a></li> <li> <a href="#quality-of-life-improvements-to-case_when-and-if_else">Quality of life improvements to <code>case_when()</code> and <code>if_else()</code></a></li> <li> <a href="#new-syntax-for-separating-columns">New syntax for separating columns</a></li> <li> <a href="#new-argument-for-line-geoms-linewidth">New argument for line geoms: linewidth</a></li> <li> <a href="#other-highlights">Other highlights</a></li> <li> <a href="#coming-up">Coming up</a></li> </ul> <p>And different from previous posts on this topic, this one comes with a video! If you&rsquo;d like a live demo of the code examples, and a few more additional tips along the way, you can watch the video below.</p> <center> <iframe width="560" height="315" src="https://www.youtube.com/embed/KsBBRHAgAhM" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> </center> <p>Throughout this blog post you&rsquo;ll encounter some code chunks with the comment <code>previously</code>, indicating what you used to do in the tidyverse. Often these will be coupled with chunks with the comment <code>now, optionally</code>, indicating what you <em>can</em> now do with the tidyverse. And rarely, they will be coupled with chunks with the comment <code>now</code>, indicating what you <em>should</em> do instead now with the tidyverse.</p> <p>Let&rsquo;s get started with the obligatory&hellip;</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching core tidyverse packages</span> ──────────────────────── tidyverse 2.0.0 ──</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.1.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>readr </span> 2.1.4</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>forcats </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>stringr </span> 1.5.0</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.4.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.2.1</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>lubridate</span> 1.9.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.3.0</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 1.0.1 </span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ────────────────────────────────────────── tidyverse_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use the conflicted package (<span style='color: #0000BB; font-style: italic;'>&lt;http://conflicted.r-lib.org/&gt;</span>) to force all conflicts to become errors</span></span> <span></span></code></pre> </div> <p>And, let&rsquo;s also load the <a href="https://allisonhorst.github.io/palmerpenguins/" target="_blank" rel="noopener">palmerpenguins</a> package that we will use in examples.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://allisonhorst.github.io/palmerpenguins/'>palmerpenguins</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="resource-refresh">Resource refresh <a href="#resource-refresh"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>R for Data Science, 2nd Edition is out! <a href="https://www.tidyverse.org/blog/2023/07/r4ds-2e/">This blog post</a> (and the <a href="https://r4ds.hadley.nz/preface-2e.html" target="_blank" rel="noopener">book&rsquo;s preface</a>) outlines updates since the first edition. Updates to the book served as the motivation for many of the changes mentioned in the remainder of this post as as well as on the Tidyverse blog over the last year. Now that the book is out, you can expect the pace of change to slow down again for a while, which means plenty of time for phasing these changes into your teaching materials.</p> <p>One change in the 2nd Edition that will most likely affect almost all of your teaching materials is the use of the native R pipe (<code>|&gt;</code>) instead of the magrittr pipe (<code>%&gt;%</code>). If you&rsquo;re not familiar with the similarities and differences between these operators, I recommend reading <a href="https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/" target="_blank" rel="noopener">this comparison blog post</a>. And I strongly recommend making this update since it will allow students to perform piped operations with any R function, and hence allow them to keep their data pipeline workflows regardless of whether the next package they learn is from the tidyverse (or package that uses tidyverse principles) or not.</p> <h2 id="nine-core-packages-in-tidyverse-200">Nine core packages in tidyverse 2.0.0 <a href="#nine-core-packages-in-tidyverse-200"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The main update in tidyverse 2.0.0, which was released in March 2023, is that it <a href="https://lubridate.tidyverse.org/" target="_blank" rel="noopener">lubridate</a> is now a core tidyverse package. The lubridate package that makes it easier to do the things R does with date-times, is now a core tidyverse package. So, while many of your scripts in the past may have started with</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># previously</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://lubridate.tidyverse.org'>lubridate</a></span><span class='o'>)</span></span></code></pre> </div> <p>you can now just do</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># now</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span></code></pre> </div> <p>and the lubridate package will be loaded as well.</p> <p>If you, like me, use a graphic like the one below that maps the core tidyverse packages to phases of the data science cycle, here is an updated graphic including lubridate.</p> <p><img src="images/data-science.png" data-fig-alt="Data science cycle: import, tidy, transform, visualize, model, communicate. Packages readr and tibble are for import. Packages tidyr and purr for tidy and transform. Packages dplyr, stringr, forcats, and lubridate are for transform. Package ggplot2 is for visualize." /></p> <h2 id="conflict-resolution-in-the-tidyverse">Conflict resolution in the tidyverse <a href="#conflict-resolution-in-the-tidyverse"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You may have also noticed that the package loading message for the tidyverse has been updated as well, and now advertises the <a href="https://conflicted.r-lib.org/" target="_blank" rel="noopener">conflicted</a> package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ────────────────────────────────────────── tidyverse_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use the conflicted package (<span style='color: #0000BB; font-style: italic;'>&lt;http://conflicted.r-lib.org/&gt;</span>) to force all conflicts to become errors</span></span> <span></span></code></pre> </div> <p>Conflict resolution in R, i.e., what to do if multiple packages that are loaded in a session have functions with the same name, can get tricky, and the conflicted package is designed to help with that. R&rsquo;s default conflict resolution gives precedence to the most recently loaded package. For example, if you use the filter function before loading the tidyverse, R will use <a href="https://rdrr.io/r/stats/filter.html" target="_blank" rel="noopener"><code>stats::filter()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>==</span> <span class='s'>"Adelie"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error in eval(expr, envir, enclos): object 'species' not found</span></span> <span></span></code></pre> </div> <p>However, after loading the tidyverse, when you call <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>, R will <em>silently</em> choose <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>dplyr::filter()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>==</span> <span class='s'>"Adelie"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 152 × 8</span></span></span> <span><span class='c'>#&gt; species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie Torgersen 39.1 18.7 181 <span style='text-decoration: underline;'>3</span>750</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie Torgersen 39.5 17.4 186 <span style='text-decoration: underline;'>3</span>800</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie Torgersen 40.3 18 195 <span style='text-decoration: underline;'>3</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie Torgersen 36.7 19.3 193 <span style='text-decoration: underline;'>3</span>450</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie Torgersen 39.3 20.6 190 <span style='text-decoration: underline;'>3</span>650</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie Torgersen 38.9 17.8 181 <span style='text-decoration: underline;'>3</span>625</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie Torgersen 39.2 19.6 195 <span style='text-decoration: underline;'>4</span>675</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie Torgersen 34.1 18.1 193 <span style='text-decoration: underline;'>3</span>475</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie Torgersen 42 20.2 190 <span style='text-decoration: underline;'>4</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 142 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: sex &lt;fct&gt;, year &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>This silent conflict resolution approach works fine until it doesn&rsquo;t, and then it can be very frustrating to debug. The conflicted package does not allow for silent conflict resolution:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://conflicted.r-lib.org/'>conflicted</a></span><span class='o'>)</span></span> <span> </span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>==</span> <span class='s'>"Adelie"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> <span style='color: #555555;'>[conflicted]</span> <span style='font-weight: bold;'>filter</span> found in 2 packages.</span></span> <span><span class='c'>#&gt; Either pick the one you want with `::`:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> <span style='color: #0000BB;'>dplyr</span>::filter</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> <span style='color: #0000BB;'>stats</span>::filter</span></span> <span><span class='c'>#&gt; Or declare a preference with `conflicts_prefer()`:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `conflicts_prefer(dplyr::filter)`</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `conflicts_prefer(stats::filter)`</span></span> <span></span></code></pre> </div> <p>You can, of course, use <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>dplyr::filter()</code></a> but if you have a bunch of data wrangling pipelines, which is likely the case if you&rsquo;re teaching data wrangling, it can get pretty busy.</p> <p>Instead, with conflicted, you can explicitly declare which <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a> you want to use at the beginning (of a session, of a script, or of an R Markdown or Quarto file) with <a href="https://conflicted.r-lib.org/reference/conflicts_prefer.html" target="_blank" rel="noopener"><code>conflicts_prefer()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://conflicted.r-lib.org/reference/conflicts_prefer.html'>conflicts_prefer</a></span><span class='o'>(</span><span class='nf'>dplyr</span><span class='nf'>::</span><span class='nv'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[conflicted]</span> Will prefer <span style='color: #0000BB; font-weight: bold;'>dplyr</span>::filter over any other package.</span></span> <span></span><span> </span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>species</span> <span class='o'>==</span> <span class='s'>"Adelie"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 152 × 8</span></span></span> <span><span class='c'>#&gt; species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie Torgersen 39.1 18.7 181 <span style='text-decoration: underline;'>3</span>750</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie Torgersen 39.5 17.4 186 <span style='text-decoration: underline;'>3</span>800</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie Torgersen 40.3 18 195 <span style='text-decoration: underline;'>3</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie Torgersen 36.7 19.3 193 <span style='text-decoration: underline;'>3</span>450</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie Torgersen 39.3 20.6 190 <span style='text-decoration: underline;'>3</span>650</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie Torgersen 38.9 17.8 181 <span style='text-decoration: underline;'>3</span>625</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie Torgersen 39.2 19.6 195 <span style='text-decoration: underline;'>4</span>675</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie Torgersen 34.1 18.1 193 <span style='text-decoration: underline;'>3</span>475</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie Torgersen 42 20.2 190 <span style='text-decoration: underline;'>4</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 142 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2 more variables: sex &lt;fct&gt;, year &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>Getting back to the package loading message&hellip; It can be tempting, particularly in a teaching scenario, particularly to an audience of new learners, and particularly if you teach with slides and messages take up valuable slide real estate, I would urge you to not hide startup messages from teaching materials. Instead, address them early on to:</p> <ol> <li> <p>Encourage reading and understanding messages, warnings, and errors &ndash; teaching people to read error messages is hard enough, it&rsquo;s going to be even harder if you&rsquo;re not modeling that to them.</p> </li> <li> <p>Help during hard-to-debug situations resulting from base R&rsquo;s silent conflict resolution &ndash; because, let&rsquo;s face it, someone in your class, if not you during a live-coding session, will see that pesky object not found error at some point when using <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>.</p> </li> </ol> <h2 id="improved-and-expanded-_join-functionality">Improved and expanded <code>*_join()</code> functionality <a href="#improved-and-expanded-_join-functionality"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The <a href="https://dplyr.tidyverse.org/" target="_blank" rel="noopener">dplyr</a> package has long had the <a href="https://dplyr.tidyverse.org/articles/two-table.html" target="_blank" rel="noopener"><code>*_join()</code> family of functions</a> for joining data frames. dplyr 1.1.0 introduced a <a href="https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/" target="_blank" rel="noopener">bunch of extensions</a> that bring joins closer to the power available in other systems like SQL and <code>data.table</code>.</p> <h3 id="join_by"><code>join_by()</code> <a href="#join_by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>New functionality for join functions includes a new <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> function for the <code>by</code> argument. So, while in the past your code may have looked like the following:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># previously *_join( x, y, by = c("<x var>" = "<y var>") ) </code></pre> </div> <p>you can now do:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># now, optionally *_join( x, y, by = join_by(<x var> == <y var>) ) </code></pre> </div> <p>For example, suppose you have the following information on the three islands we have penguins from:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>islands</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>name</span>, <span class='o'>~</span><span class='nv'>coordinates</span>,</span> <span> <span class='s'>"Torgersen"</span>, <span class='s'>"64°46′S 64°5′W"</span>,</span> <span> <span class='s'>"Biscoe"</span>, <span class='s'>"65°26′S 65°30′W"</span>,</span> <span> <span class='s'>"Dream"</span>, <span class='s'>"64°44′S 64°14′W"</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>islands</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; name coordinates </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Torgersen 64°46′S 64°5′W </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Biscoe 65°26′S 65°30′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Dream 64°44′S 64°14′W</span></span> <span></span></code></pre> </div> <p>You can join this to the penguins data frame by matching the <code>island</code> column in the penguins data frame to the <code>name</code> column in the islands data frame:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span></span> <span> <span class='nv'>islands</span>, </span> <span> by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>island</span> <span class='o'>==</span> <span class='nv'>name</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>species</span>, <span class='nv'>island</span>, <span class='nv'>coordinates</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 3</span></span></span> <span><span class='c'>#&gt; species island coordinates </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Adelie Torgersen 64°46′S 64°5′W</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span></span></code></pre> </div> <p>While <code>by = c(&quot;island&quot; = &quot;name&quot;)</code> would still work, I would recommend teaching <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> over <code>by</code> so that:</p> <ol> <li>You can read it out loud as &ldquo;where x is equal to y&rdquo;, just like in other logical statements where <code>==</code> is pronounced as &ldquo;is equal to&rdquo;.</li> <li>You don&rsquo;t have to worry about <code>by = c(x = y)</code> (which is invalid) vs. <code>by = c(x = &quot;y&quot;)</code> (which is valid) vs. <code>by = c(&quot;x&quot; = &quot;y&quot;)</code> (which is also valid).</li> </ol> <p>In fact, for succinctness, you might avoid the argument name and express this as:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>islands</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>island</span> <span class='o'>==</span> <span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <h3 id="handling-various-matches">Handling various matches <a href="#handling-various-matches"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The <code>*_join()</code> functions now have additional arguments for handling <code>multiple</code> matches and <code>unmatched</code> rows as well as for specifying the <code>relationship</code> between the two data frames.</p> <p>So, while in the past your code may have looked like the following:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># previously *_join( x, y, by ) </code></pre> </div> <p>you can now do:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># now, optionally *_join( x, y, by, multiple = "all", unmatched = "drop", relationship = NULL ) </code></pre> </div> <p>Let&rsquo;s set up three data frames to demonstrate the new functionality:</p> <ul> <li>Information about three penguins, one row per <code>samp_id</code>:</li> </ul> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>three_penguins</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>samp_id</span>, <span class='o'>~</span><span class='nv'>species</span>, <span class='o'>~</span><span class='nv'>island</span>,</span> <span> <span class='m'>1</span>, <span class='s'>"Adelie"</span>, <span class='s'>"Torgersen"</span>,</span> <span> <span class='m'>2</span>, <span class='s'>"Gentoo"</span>, <span class='s'>"Biscoe"</span>,</span> <span> <span class='m'>3</span>, <span class='s'>"Chinstrap"</span>, <span class='s'>"Dream"</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>three_penguins</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; samp_id species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Chinstrap Dream</span></span> <span></span></code></pre> </div> <ul> <li>Information about weight measurements of these penguins, one row per <code>samp_id</code>, <code>meas_id</code> combination:</li> </ul> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>samp_id</span>, <span class='o'>~</span><span class='nv'>meas_id</span>, <span class='o'>~</span><span class='nv'>body_mass_g</span>,</span> <span> <span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>3220</span>,</span> <span> <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3250</span>,</span> <span> <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>4730</span>,</span> <span> <span class='m'>2</span>, <span class='m'>2</span>, <span class='m'>4725</span>,</span> <span> <span class='m'>3</span>, <span class='m'>1</span>, <span class='m'>4000</span>,</span> <span> <span class='m'>3</span>, <span class='m'>2</span>, <span class='m'>4050</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>weight_measurements</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; samp_id meas_id body_mass_g</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 <span style='text-decoration: underline;'>3</span>220</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 <span style='text-decoration: underline;'>3</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 1 <span style='text-decoration: underline;'>4</span>730</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 <span style='text-decoration: underline;'>4</span>725</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 1 <span style='text-decoration: underline;'>4</span>000</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 2 <span style='text-decoration: underline;'>4</span>050</span></span> <span></span></code></pre> </div> <ul> <li>Information about flipper measurements of these penguins, one row per <code>samp_id</code>, <code>meas_id</code> combination:</li> </ul> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>flipper_measurements</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>samp_id</span>, <span class='o'>~</span><span class='nv'>meas_id</span>, <span class='o'>~</span><span class='nv'>flipper_length_mm</span>,</span> <span> <span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>193</span>,</span> <span> <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>195</span>,</span> <span> <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>214</span>,</span> <span> <span class='m'>2</span>, <span class='m'>2</span>, <span class='m'>216</span>,</span> <span> <span class='m'>3</span>, <span class='m'>1</span>, <span class='m'>203</span>,</span> <span> <span class='m'>3</span>, <span class='m'>2</span>, <span class='m'>203</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>flipper_measurements</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; samp_id meas_id flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 1 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 1 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 2 203</span></span> <span></span></code></pre> </div> <p>One-to-many relationships don&rsquo;t require extra care, they just work:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>three_penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>weight_measurements</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; samp_id species island meas_id body_mass_g</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Adelie Torgersen 1 <span style='text-decoration: underline;'>3</span>220</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 Adelie Torgersen 2 <span style='text-decoration: underline;'>3</span>250</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 Gentoo Biscoe 1 <span style='text-decoration: underline;'>4</span>730</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 Gentoo Biscoe 2 <span style='text-decoration: underline;'>4</span>725</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 Chinstrap Dream 1 <span style='text-decoration: underline;'>4</span>000</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 Chinstrap Dream 2 <span style='text-decoration: underline;'>4</span>050</span></span> <span></span></code></pre> </div> <p>However, many-to-many relationships require some extra care. For example, if we join the <code>three_penguins</code> data frame to the <code>flipper_measurements</code> data frame, we get a warning:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>flipper_measurements</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in left_join(weight_measurements, flipper_measurements, join_by(samp_id)): Detected an unexpected many-to-many relationship between `x` and `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `x` matches multiple rows in `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `y` matches multiple rows in `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> If a many-to-many relationship is expected, set `relationship =</span></span> <span><span class='c'>#&gt; "many-to-many"` to silence this warning.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 12 × 5</span></span></span> <span><span class='c'>#&gt; samp_id meas_id.x body_mass_g meas_id.y flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 1 1 <span style='text-decoration: underline;'>3</span>220 1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1 1 <span style='text-decoration: underline;'>3</span>220 2 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1 2 <span style='text-decoration: underline;'>3</span>250 1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1 2 <span style='text-decoration: underline;'>3</span>250 2 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 2 1 <span style='text-decoration: underline;'>4</span>730 1 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 2 1 <span style='text-decoration: underline;'>4</span>730 2 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 2 2 <span style='text-decoration: underline;'>4</span>725 1 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2 2 <span style='text-decoration: underline;'>4</span>725 2 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 3 1 <span style='text-decoration: underline;'>4</span>000 1 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 3 1 <span style='text-decoration: underline;'>4</span>000 2 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 3 2 <span style='text-decoration: underline;'>4</span>050 1 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 3 2 <span style='text-decoration: underline;'>4</span>050 2 203</span></span> <span></span></code></pre> </div> <p>We get a warning about unexpected many-to-many relationships (unexpected because we didn&rsquo;t specify this type of relationship in our join call), and the warning suggests setting <code>relationship = &quot;many-to-many&quot;</code>. And note that we went from 6 rows (measurements) to 12, which is also unexpected.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>flipper_measurements</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span>, relationship <span class='o'>=</span> <span class='s'>"many-to-many"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 12 × 5</span></span></span> <span><span class='c'>#&gt; samp_id meas_id.x body_mass_g meas_id.y flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 1 1 <span style='text-decoration: underline;'>3</span>220 1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1 1 <span style='text-decoration: underline;'>3</span>220 2 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1 2 <span style='text-decoration: underline;'>3</span>250 1 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1 2 <span style='text-decoration: underline;'>3</span>250 2 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 2 1 <span style='text-decoration: underline;'>4</span>730 1 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 2 1 <span style='text-decoration: underline;'>4</span>730 2 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 2 2 <span style='text-decoration: underline;'>4</span>725 1 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2 2 <span style='text-decoration: underline;'>4</span>725 2 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 3 1 <span style='text-decoration: underline;'>4</span>000 1 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 3 1 <span style='text-decoration: underline;'>4</span>000 2 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 3 2 <span style='text-decoration: underline;'>4</span>050 1 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 3 2 <span style='text-decoration: underline;'>4</span>050 2 203</span></span> <span></span></code></pre> </div> <p>With <code>relationship = &quot;many-to-many&quot;</code>, we no longer get a warning. However, the &ldquo;explosion of rows&rdquo; issue is still there. Addressing that requires rethinking what we join the two data frames by:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>flipper_measurements</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span>, <span class='nv'>meas_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 4</span></span></span> <span><span class='c'>#&gt; samp_id meas_id body_mass_g flipper_length_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 <span style='text-decoration: underline;'>3</span>220 193</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 <span style='text-decoration: underline;'>3</span>250 195</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 1 <span style='text-decoration: underline;'>4</span>730 214</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 <span style='text-decoration: underline;'>4</span>725 216</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 1 <span style='text-decoration: underline;'>4</span>000 203</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 2 <span style='text-decoration: underline;'>4</span>050 203</span></span> <span></span></code></pre> </div> <p>We can see that while the warning nudged us towards setting <code>relationship = &quot;many-to-many&quot;</code>, turns out the correct way to address the problem was to join by both <code>samp_id</code> and <code>meas_id</code>.</p> <p>We&rsquo;ll wrap up our discussion on new functionality for handling <code>unmatched</code> cases. We&rsquo;ll create one more data frame (<code>four_penguins</code>) to exemplify this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>four_penguins</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>samp_id</span>, <span class='o'>~</span><span class='nv'>species</span>, <span class='o'>~</span><span class='nv'>island</span>,</span> <span> <span class='m'>1</span>, <span class='s'>"Adelie"</span>, <span class='s'>"Torgersen"</span>,</span> <span> <span class='m'>2</span>, <span class='s'>"Gentoo"</span>, <span class='s'>"Biscoe"</span>,</span> <span> <span class='m'>3</span>, <span class='s'>"Chinstrap"</span>, <span class='s'>"Dream"</span>,</span> <span> <span class='m'>4</span>, <span class='s'>"Adelie"</span>, <span class='s'>"Biscoe"</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>four_penguins</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; samp_id species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Chinstrap Dream </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 4 Adelie Biscoe</span></span> <span></span></code></pre> </div> <p>If we just join <code>weight_measurements</code> to <code>four_penguins</code>, the unmatched fourth penguin silently disappears, which is less than ideal, particularly in a more realistic scenario with many more observations:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>four_penguins</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; samp_id meas_id body_mass_g species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 <span style='text-decoration: underline;'>3</span>220 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 <span style='text-decoration: underline;'>3</span>250 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 1 <span style='text-decoration: underline;'>4</span>730 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 <span style='text-decoration: underline;'>4</span>725 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 1 <span style='text-decoration: underline;'>4</span>000 Chinstrap Dream </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 2 <span style='text-decoration: underline;'>4</span>050 Chinstrap Dream</span></span> <span></span></code></pre> </div> <p>Setting <code>unmatched = &quot;error&quot;</code> protects you from accidentally dropping rows:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>four_penguins</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span>, unmatched <span class='o'>=</span> <span class='s'>"error"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `left_join()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Each row of `y` must be matched by `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 4 of `y` was not matched.</span></span> <span></span></code></pre> </div> <p>Once you see the error message, you can decide how to handle the unmatched rows, e.g., explicitly drop them.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>weight_measurements</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>four_penguins</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>samp_id</span><span class='o'>)</span>, unmatched <span class='o'>=</span> <span class='s'>"drop"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; samp_id meas_id body_mass_g species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 <span style='text-decoration: underline;'>3</span>220 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 <span style='text-decoration: underline;'>3</span>250 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 1 <span style='text-decoration: underline;'>4</span>730 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 <span style='text-decoration: underline;'>4</span>725 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 1 <span style='text-decoration: underline;'>4</span>000 Chinstrap Dream </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 2 <span style='text-decoration: underline;'>4</span>050 Chinstrap Dream</span></span> <span></span></code></pre> </div> <p>There are many more developments related to <code>*_join()</code> functions (e.g., <a href="https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#inequality-joins">inequality joins</a> and <a href="https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#rolling-joins">rolling joins</a>), but many of these likely wouldn&rsquo;t come up in an introductory course so we won&rsquo;t get into their details. A good place to read more about them is <a href="https://r4ds.hadley.nz/joins.html#sec-non-equi-joins" target="_blank" rel="noopener">R for Data Science, 2nd edition</a>.</p> <p>Exploding joins (i.e., joins that result in a larger number of rows than either of the data frames from bie) can be hard to debug for students! Teaching them the tools to diagnose whether the join they performed, and that may not have given an error, is indeed the one they wanted to perform. Did they lose any cases? Did they gain an unexpected amount of cases? Did they perform a join without thinking and take down the entire teaching server? These things happen, particularly if students are working with their own data for an open-ended project!</p> <h2 id="per-operation-grouping">Per operation grouping <a href="#per-operation-grouping"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To calculate grouped summary statistics, you previously needed to do something like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># previously</span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>Now, an alternative approach is to pass the groups directly in the <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarize()</code></a> call:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># now, optionally</span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>y</span><span class='o'>)</span>, </span> <span> .by <span class='o'>=</span> <span class='nv'>x</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>Let&rsquo;s take a look at the differences between these two approaches before making a recommendation for one over the other. <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> can result in groups that persist in the output, particularly when grouping by multiple variables. For example, in the following pipeline we group the penguins data frame by <code>species</code> and <code>sex</code>, find mean body weights for each resulting species / sex combination, and then show the first observation in the output with <code>slice_head(n = 1)</code>. Since the output is grouped by species, this results in one summary statistic per species.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='nv'>sex</span>, <span class='nv'>body_mass_g</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>species</span>, <span class='nv'>sex</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span>mean_bw <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice_head</a></span><span class='o'>(</span>n <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; `summarise()` has grouped output by 'species'. You can override using the</span></span> <span><span class='c'>#&gt; `.groups` argument.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Groups: species [3]</span></span></span> <span><span class='c'>#&gt; species sex mean_bw</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Adelie female <span style='text-decoration: underline;'>3</span>369.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Chinstrap female <span style='text-decoration: underline;'>3</span>527.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Gentoo female <span style='text-decoration: underline;'>4</span>680.</span></span> <span></span></code></pre> </div> <p>If we explicitly drop the groups in the <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarize()</code></a> call, so that the output is no longer grouped, we get just one row in our output.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='nv'>sex</span>, <span class='nv'>body_mass_g</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>species</span>, <span class='nv'>sex</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span>mean_bw <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span>, .groups <span class='o'>=</span> <span class='s'>"drop"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice_head</a></span><span class='o'>(</span>n <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 3</span></span></span> <span><span class='c'>#&gt; species sex mean_bw</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Adelie female <span style='text-decoration: underline;'>3</span>369.</span></span> <span></span></code></pre> </div> <p>This pair of examples show that whether your output is grouped or not can affect downstream results, and if you&rsquo;re a <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> user, you&rsquo;ve probably been burnt by this once or twice.</p> <p>Per-operation grouping allows you to define groups in a <code>.by</code> argument, and these groups don&rsquo;t persist. So, regardless of whether you group by one or two variables, the resulting data frame after calculating a summary statistic is not grouped.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># group by 1 variable</span></span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='nv'>sex</span>, <span class='nv'>body_mass_g</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span></span> <span> mean_bw <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span>, </span> <span> .by <span class='o'>=</span> <span class='nv'>species</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; species mean_bw</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Adelie <span style='text-decoration: underline;'>3</span>706.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Gentoo <span style='text-decoration: underline;'>5</span>092.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Chinstrap <span style='text-decoration: underline;'>3</span>733.</span></span> <span></span><span></span> <span><span class='c'># group by 2 variables</span></span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='nv'>sex</span>, <span class='nv'>body_mass_g</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarize</a></span><span class='o'>(</span></span> <span> mean_bw <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span>, </span> <span> .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>species</span>, <span class='nv'>sex</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; species sex mean_bw</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Adelie male <span style='text-decoration: underline;'>4</span>043.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Adelie female <span style='text-decoration: underline;'>3</span>369.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Gentoo female <span style='text-decoration: underline;'>4</span>680.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> Gentoo male <span style='text-decoration: underline;'>5</span>485.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> Chinstrap female <span style='text-decoration: underline;'>3</span>527.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> Chinstrap male <span style='text-decoration: underline;'>3</span>939.</span></span> <span></span></code></pre> </div> <p>So, when teaching grouped operations, you now have the option to choose between these two approaches. The most important teaching tip I can give, particularly for teaching to new learners, is to choose one method and stick to it. The <code>.by</code> method will result in fewer outputs that are unintentionally grouped, and hence, might potentially be easier for new learners. And while this approach is mentioned in R for Data Science, 2nd edition, the <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> approach is described in more detail.</p> <p>On the other hand. for more experienced learners, particularly those learning to design their own functions and packages, the evolution of grouping in the tidyverse can be an interesting subject to review.</p> <h2 id="quality-of-life-improvements-to-case_when-and-if_else">Quality of life improvements to <code>case_when()</code> and <code>if_else()</code> <a href="#quality-of-life-improvements-to-case_when-and-if_else"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="case_when"><code>case_when()</code> <a href="#case_when"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Previously, when writing a <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> statement, you had to use <code>TRUE</code> to indicate &ldquo;all else&rdquo;. Additionally, <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> has historically been strict about the types on the right-hand side, e.g., requiring <code>NA_character</code> when other right-hand side values are characters, and not letting you get away with just <code>NA</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># previously df |> mutate( x = case_when( <condition 1> ~ "value 1", <condition 2> ~ "value 2", <condition 3> ~ "value 3", TRUE ~ NA_character_ ) ) </code></pre> </div> <p>Now, optionally, you can define &ldquo;all else&rdquo; in a <code>.default</code> argument of <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> and you no longer need to worry about the type of <code>NA</code> you use on the right-hand side.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'># now, optionally df |> mutate( x = case_when( <condition 1> ~ "value 1", <condition 2> ~ "value 2", <condition 3> ~ "value 3", .default = NA ) ) </code></pre> </div> <p>For example, you can now do something like the following when creating a categorical version of a numerical variable that has some <code>NA</code>s.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> bm_cat <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span> <span class='o'>~</span> <span class='kc'>NA</span>,</span> <span> <span class='nv'>body_mass_g</span> <span class='o'>&lt;</span> <span class='m'>3550</span> <span class='o'>~</span> <span class='s'>"Small"</span>,</span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/between.html'>between</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span>, <span class='m'>3550</span>, <span class='m'>4750</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"Medium"</span>,</span> <span> .default <span class='o'>=</span> <span class='s'>"Large"</span></span> <span> <span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/relocate.html'>relocate</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span>, <span class='nv'>bm_cat</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 9</span></span></span> <span><span class='c'>#&gt; body_mass_g bm_cat species island bill_length_mm bill_depth_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>3</span>750 Medium Adelie Torgersen 39.1 18.7</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>3</span>800 Medium Adelie Torgersen 39.5 17.4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>3</span>250 Small Adelie Torgersen 40.3 18 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>3</span>450 Small Adelie Torgersen 36.7 19.3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>3</span>650 Medium Adelie Torgersen 39.3 20.6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>3</span>625 Medium Adelie Torgersen 38.9 17.8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>4</span>675 Medium Adelie Torgersen 39.2 19.6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>3</span>475 Small Adelie Torgersen 34.1 18.1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>4</span>250 Medium Adelie Torgersen 42 20.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 3 more variables: flipper_length_mm &lt;int&gt;, sex &lt;fct&gt;, year &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <h3 id="if_else"><code>if_else()</code> <a href="#if_else"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Similarly, <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a> is no longer as strict about typed missing values either.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> bm_unit <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/if_else.html'>if_else</a></span><span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span>, <span class='s'>"g"</span><span class='o'>)</span>, <span class='kc'>NA</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/relocate.html'>relocate</a></span><span class='o'>(</span><span class='nv'>body_mass_g</span>, <span class='nv'>bm_unit</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 344 × 9</span></span></span> <span><span class='c'>#&gt; body_mass_g bm_unit species island bill_length_mm bill_depth_mm</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>3</span>750 3750 g Adelie Torgersen 39.1 18.7</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>3</span>800 3800 g Adelie Torgersen 39.5 17.4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>3</span>250 3250 g Adelie Torgersen 40.3 18 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> Adelie Torgersen <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>3</span>450 3450 g Adelie Torgersen 36.7 19.3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>3</span>650 3650 g Adelie Torgersen 39.3 20.6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>3</span>625 3625 g Adelie Torgersen 38.9 17.8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>4</span>675 4675 g Adelie Torgersen 39.2 19.6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>3</span>475 3475 g Adelie Torgersen 34.1 18.1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>4</span>250 4250 g Adelie Torgersen 42 20.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 334 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 3 more variables: flipper_length_mm &lt;int&gt;, sex &lt;fct&gt;, year &lt;int&gt;</span></span></span> <span></span></code></pre> </div> <p>While these may be seemingly small improvements, I think they have huge benefits for teaching and learning. It&rsquo;s a blessing to not have to introduce <code>NA_character_</code> and friends as early as introducing <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a> and <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a>! Different types of <code>NA</code>s are a good topic for a course on R as a programming language, statistical computing, etc. but they are unnecessarily complex for an introductory course.</p> <h2 id="new-syntax-for-separating-columns">New syntax for separating columns <a href="#new-syntax-for-separating-columns"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The following table summarizes new syntax for separating columns in tidyr that supersede <a href="https://tidyr.tidyverse.org/reference/extract.html" target="_blank" rel="noopener"><code>extract()</code></a>, <a href="https://tidyr.tidyverse.org/reference/separate.html" target="_blank" rel="noopener"><code>separate()</code></a>, and <a href="https://tidyr.tidyverse.org/reference/separate_rows.html" target="_blank" rel="noopener"><code>separate_rows()</code></a>. These updates are motivated by the goal of achieving a set of functions that have more consistent names and arguments, have better performance, and provide a new approach for handling problems:</p> <table> <thead> <tr> <th align="left"></th> <th align="left"><strong>MAKE COLUMNS</strong></th> <th align="left"><strong>MAKE ROWS</strong></th> </tr> </thead> <tbody> <tr> <td align="left">Separate with delimiter</td> <td align="left"> <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a></td> <td align="left"> <a href="https://tidyr.tidyverse.org/reference/separate_longer_delim.html" target="_blank" rel="noopener"><code>separate_longer_delim()</code></a></td> </tr> <tr> <td align="left">Separate by position</td> <td align="left"> <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_position()</code></a></td> <td align="left"> <a href="https://tidyr.tidyverse.org/reference/separate_longer_delim.html" target="_blank" rel="noopener"><code>separate_longer_position()</code></a></td> </tr> <tr> <td align="left">Separate with regular expression</td> <td align="left"></td> <td align="left"></td> </tr> </tbody> </table> <p>Here is an example for using some of these functions. Let&rsquo;s suppose we have data on three penguins with their descriptions.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>three_penguin_descriptions</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>id</span>, <span class='o'>~</span><span class='nv'>description</span>,</span> <span> <span class='m'>1</span>, <span class='s'>"Species: Adelie, Island - Torgersen"</span>,</span> <span> <span class='m'>2</span>, <span class='s'>"Species: Gentoo, Island - Biscoe"</span>,</span> <span> <span class='m'>3</span>, <span class='s'>"Species: Chinstrap, Island - Dream"</span>,</span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>three_penguin_descriptions</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; id description </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Species: Adelie, Island - Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Species: Gentoo, Island - Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Species: Chinstrap, Island - Dream</span></span> <span></span></code></pre> </div> <p>We can seaprate the description column into <code>species</code> and <code>island</code> with <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>three_penguin_descriptions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_delim</a></span><span class='o'>(</span></span> <span> cols <span class='o'>=</span> <span class='nv'>description</span>,</span> <span> delim <span class='o'>=</span> <span class='s'>", "</span>,</span> <span> names <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"species"</span>, <span class='s'>"island"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; id species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Species: Adelie Island - Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Species: Gentoo Island - Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Species: Chinstrap Island - Dream</span></span> <span></span></code></pre> </div> <p>Or we can do so with regular expressions:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>three_penguin_descriptions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_regex</a></span><span class='o'>(</span></span> <span> cols <span class='o'>=</span> <span class='nv'>description</span>,</span> <span> patterns <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span></span> <span> <span class='s'>"Species: "</span>, species <span class='o'>=</span> <span class='s'>"[^,]+"</span>, </span> <span> <span class='s'>", "</span>, </span> <span> <span class='s'>"Island - "</span>, island <span class='o'>=</span> <span class='s'>".*"</span></span> <span> <span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; id species island </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Adelie Torgersen</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Gentoo Biscoe </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Chinstrap Dream</span></span> <span></span></code></pre> </div> <p>If teaching folks coming from doing data manipulation in spreadsheets, leverage that to motivate different types of <code>separate_*()</code> functions, and show the benefits of programming over point-and-click software for more advanced operations like separating longer and separating with regular expressions.</p> <h2 id="new-argument-for-line-geoms-linewidth">New argument for line geoms: <code>linewidth</code> <a href="#new-argument-for-line-geoms-linewidth"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you, like me, have a bunch of scatterplots with smooth lines overlaid on them, you might run into the following warning.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># previously</span></span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>flipper_length_mm</span>, y <span class='o'>=</span> <span class='nv'>body_mass_g</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_smooth.html'>geom_smooth</a></span><span class='o'>(</span>size <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `linewidth` instead.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>This warning is displayed once every 8 hours.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>Call `lifecycle::last_lifecycle_warnings()` to see where this warning was</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>generated.</span></span></span> <span></span><span><span class='c'>#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</span></span> <span></span></code></pre> <p><img src="figs/unnamed-chunk-42-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Instead of <code>size</code>, you should now be using <code>linewidth</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># now</span></span> <span><span class='nv'>penguins</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/drop_na.html'>drop_na</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>flipper_length_mm</span>, y <span class='o'>=</span> <span class='nv'>body_mass_g</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_smooth.html'>geom_smooth</a></span><span class='o'>(</span>linewidth <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</span></span> <span></span></code></pre> <p><img src="figs/unnamed-chunk-43-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The teaching tip should be obvious here&hellip; Check the output of your old teaching materials thoroughly to not make a fool of yourself when teaching! 🤣</p> <h2 id="other-highlights">Other highlights <a href="#other-highlights"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><ul> <li> <p>purrr 1.0.0: While purrr is likely not a common topic in introductory data science curricula, if you do teach iteration with purrr, you&rsquo;ll want to check out the <a href="https://www.tidyverse.org/blog/2022/12/purrr-1-0-0/" target="_blank" rel="noopener">purrr 1.0.0 blog post</a>. I also highly recommend <a href="https://youtu.be/EGAs7zuRutY" target="_blank" rel="noopener">Hadley&rsquo;s purrr video</a> to those who are new to purrr as well as those who want to quickly review most recent updates to it.</p> </li> <li> <p>webR 0.1.0: webR provides a framework for creating websites where users can run R code directly within the web browser, without R installed on their device or a supporting computational R server. This is hugely exciting for writing educational materials, like interactive lesson notes, and there&rsquo;s already a Quarto extension that allows you to do this: <a href="https://github.com/coatless/quarto-webr">https://github.com/coatless/quarto-webr</a>. I think this is an important space to watch for educators!</p> </li> </ul> <h2 id="coming-up">Coming up <a href="#coming-up"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I will be teaching a &ldquo;Teaching Data Science Masterclass&rdquo; at posit::conf(2023), with a module specifically on teaching the Tidyverse. <a href="https://youtu.be/5TVd_whxUus" target="_blank" rel="noopener">Watch the course trailer</a> and <a href="https://reg.conf.posit.co/flow/posit/positconf23/attendee-portal/page/sessioncatalog?search=%22Teaching%20Data%20Science%20Masterclass%22&amp;search.sessiontype=1675316728702001wr6r" target="_blank" rel="noopener">read the full course description</a> if you&rsquo;d like to find out more and sign up!</p> Solutions for R4DS, 2e with Data Trail https://www.tidyverse.org/blog/2023/08/data-trail/ Wed, 02 Aug 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/08/data-trail/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>Last year at rstudio::conf(2022) Jeff Leek <a href="https://youtu.be/Vf301YCxP1Q" target="_blank" rel="noopener">shared about the Data Trail program</a>.</p> <blockquote> <p>DataTrail is a no-cost, paid 14-week educational initiative for young-adult, high school and GED-graduates. DataTrail aims to equip members of underserved communities with the necessary skills and support required to work in the booming field of data science. DataTrail is a fresh take on workforce development that focuses on training both Black, Indigenous, and other people of color (BIPOC) interested in the data science industry <em><strong>and</strong></em> their potential employers.</p> </blockquote> <p>We have been so excited then to have the opportunity to work with two Data Trail interns this year! <a href="https://jabirghaffar.quarto.pub/jabir/" target="_blank" rel="noopener">Jabir Ghaffar</a> went through the Data Trail program June 2022, and <a href="https://www.linkedin.com/in/davon-person-1ba973194/" target="_blank" rel="noopener">Davon Person</a> went through the Data Trail program in 2019, and now works as a Data Programming Specialist with the project. Jabir and Davon worked on solutions for the R for Data Science book, explored some Tidy Tuesday datasets, created their own Quarto websites, and their perspectives helped us learn more about how our tools and documentation can better support emerging data scientists.</p> <p>Jabir&rsquo;s primary project was to work on the <a href="https://mine-cetinkaya-rundel.github.io/r4ds-solutions/" target="_blank" rel="noopener">R for Data Science solutions</a>. The R for Data Science, 2nd Edition was released in June 2023. In this edition there is a lot of new content, and revisions and additions to exercises. We saw that <a href="https://jrnold.github.io/r4ds-exercise-solutions/" target="_blank" rel="noopener">Jeffrey Arnold&rsquo;s solutions to the 1st edition</a> are such a useful resource for the community. Therefore, this project aimed to both create a similar resource for 2nd edition and serve as an educational resource for the interns to help sharpen their tidyverse and general data science skills.</p> <p>Some learning highlights for Jabir included faceting (as a solution to overplotting), consistent code styling (which helps make the code more pleasing to read), and, making maps! Specifically, Jabir mentioned that he has always wondered &ldquo;how did they do that?!&rdquo; with maps and found them to be quite intimidating, so it was especially satisfying to create his first heatmap of US states and chance of getting a tornado (<a href="https://jabirghaffar.quarto.pub/jabir/posts/tornado_mapping_exploration">https://jabirghaffar.quarto.pub/jabir/posts/tornado_mapping_exploration</a>). This was not just a satisfying visualization exercise, but also a great opportunity to dig into unfamiliar data wrangling functions like <a href="https://dplyr.tidyverse.org/reference/recode.html" target="_blank" rel="noopener"><code>recode()</code></a> for converting 2-letter state abbreviations to state names.</p> <p>The exercises in R4DS range from quick, almost obvious, drills to ones that can really make you think and spin your wheels for a bit. One such exercise for Jabir was the one on changing the display of presidential terms from <a href="https://r4ds.hadley.nz/communication.html#exercises-2" target="_blank" rel="noopener">Section 12.4.6</a>. The exercise asks:</p> <blockquote> <p>Change the display of the presidential terms by:</p> <ul> <li>Combining the two variants that customize colors and x axis breaks.</li> <li>Improving the display of the y axis.</li> <li>Labelling each term with the name of the president.</li> <li>Adding informative plot labels.</li> <li>Placing breaks every 4 years (this is trickier than it seems!).</li> </ul> </blockquote> <p>The starting points for the exercise are the following plots from the text:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span></span> <span><span class='nv'>presidential</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='m'>33</span> <span class='o'>+</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>row_number</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>start</span>, y <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_segment.html'>geom_segment</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>xend <span class='o'>=</span> <span class='nv'>end</span>, yend <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_date.html'>scale_x_date</a></span><span class='o'>(</span>name <span class='o'>=</span> <span class='kc'>NULL</span>, breaks <span class='o'>=</span> <span class='nv'>presidential</span><span class='o'>$</span><span class='nv'>start</span>, date_labels <span class='o'>=</span> <span class='s'>"'%y"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>presidential</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='m'>33</span> <span class='o'>+</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>row_number</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>start</span>, y <span class='o'>=</span> <span class='nv'>id</span>, color <span class='o'>=</span> <span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_segment.html'>geom_segment</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>xend <span class='o'>=</span> <span class='nv'>end</span>, yend <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_manual.html'>scale_color_manual</a></span><span class='o'>(</span>values <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>Republican <span class='o'>=</span> <span class='s'>"#E81B23"</span>, Democratic <span class='o'>=</span> <span class='s'>"#00AEF3"</span><span class='o'>)</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/presidential-terms-start-1.png" width="50%" style="display: block; margin: auto;" /><img src="figs/presidential-terms-start-2.png" width="50%" style="display: block; margin: auto;" /></p> </div> <p>Jabir says &ldquo;This question completely puzzled me. I&rsquo;d say good luck, and if you&rsquo;re new and you get to this question, I recommend you look at the solution manual.&rdquo;</p> <p>The first challenge was identifying where in the text the original plot was developed, and the code associated with it. And then, the most challenging part of this exercise was labeling the y-axis with the names of presidents. It took lots of Googling, but ultimately Jabir used suggestions from ChatGPT to get this over the finish line. And, perhaps, the frustrating and satisfying part is that the answer was pretty obvious in hindsight:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>presidential</span> <span class='o'>&lt;-</span> <span class='nv'>presidential</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='m'>33</span> <span class='o'>+</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>row_number</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>presidential</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>start</span>, y <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>color <span class='o'>=</span> <span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_segment.html'>geom_segment</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>xend <span class='o'>=</span> <span class='nv'>end</span>, yend <span class='o'>=</span> <span class='nv'>id</span>, color <span class='o'>=</span> <span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_text</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>label <span class='o'>=</span> <span class='nv'>name</span><span class='o'>)</span>, hjust <span class='o'>=</span> <span class='m'>0</span>, vjust <span class='o'>=</span> <span class='m'>0</span>, nudge_y <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_manual.html'>scale_color_manual</a></span><span class='o'>(</span>values <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>Republican <span class='o'>=</span> <span class='s'>"#E81B23"</span>, Democratic <span class='o'>=</span> <span class='s'>"#00AEF3"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_date.html'>scale_x_date</a></span><span class='o'>(</span></span> <span> name <span class='o'>=</span> <span class='s'>"Term"</span>,</span> <span> breaks <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span>from <span class='o'>=</span> <span class='nf'><a href='https://lubridate.tidyverse.org/reference/ymd.html'>ymd</a></span><span class='o'>(</span><span class='s'>"1953-01-20"</span><span class='o'>)</span>, to <span class='o'>=</span> <span class='nf'><a href='https://lubridate.tidyverse.org/reference/ymd.html'>ymd</a></span><span class='o'>(</span><span class='s'>"2021-01-20"</span><span class='o'>)</span>, by <span class='o'>=</span> <span class='s'>"4 years"</span><span class='o'>)</span>,</span> <span> date_labels <span class='o'>=</span> <span class='s'>"'%y"</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='o'>(</span>breaks <span class='o'>=</span> <span class='m'>34</span><span class='o'>:</span><span class='m'>45</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span></span> <span> panel.grid.minor <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_blank</a></span><span class='o'>(</span><span class='o'>)</span>,</span> <span> axis.ticks.y <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_blank</a></span><span class='o'>(</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='s'>"Term"</span>,</span> <span> y <span class='o'>=</span> <span class='s'>"President"</span>,</span> <span> title <span class='o'>=</span> <span class='s'>"Terms of US Presidents"</span>,</span> <span> subtitle <span class='o'>=</span> <span class='s'>"Eisenhower (34th) to Trump (45th)"</span>,</span> <span> color <span class='o'>=</span> <span class='s'>"Party"</span></span> <span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/presidential-terms-end-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Jabir felt like moments when he knew what to do and how to answer each question were very satisfying, but the moments where he felt stuck and went into rabbit holes looking for answers made him question whether he wanted to continue becoming a data scientist. Ultimately, though, the project was enjoyable, and not just a great learning experience for Jabir, but also a very meaningful one because it created a resource that can help future data scientists.</p> <p>We felt like Jabir and Davon advanced their data science skills, and familiarity with working in open source, throughout the project. It was particularly exciting to see Jabir create his own data science portfolio as a Quarto website with posts on the Tidy Tuesday datasets, and we really appreciated his work on the R4DS Solutions. In going through those, he helped better refine the book, and created a resource that so many people are going to be able to use and learn from. We could see how he learned not just data science, but also grew as a leader who will continue to support others in their learning as he moves on to work as a developer with the Data Trail program. Davon too focused not just on data science, but was a part of our team, and the Data Trail team, providing mentorship to Jabir and bringing teaching approaches, like using Tidy Tuesday datasets, to his community. We are so grateful to have had the opportunity to work with them both, and see this as the beginning of continued collaborations.</p> <p>Just like most open-source projects, the <a href="https://mine-cetinkaya-rundel.github.io/r4ds-solutions/" target="_blank" rel="noopener">R4DS Solutions</a> is a living and breathing project, still a work-in-progress. We would welcome any community contributions! All perspectives are important here, and it&rsquo;s a great project if you&rsquo;re a first-time contributor.</p> Q2 2023 tidymodels digest https://www.tidyverse.org/blog/2023/07/tidymodels-2023-q2/ Wed, 19 Jul 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/07/tidymodels-2023-q2/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like the <a href="https://www.tidyverse.org/blog/2023/05/desirability2/" target="_blank" rel="noopener">post</a> on the release of the new desirability2 package.</p> <p>Since <a href="https://www.tidyverse.org/blog/2023/04/tidymodels-2023-q1/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 7 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>agua <a href="https://agua.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.3)</a></li> <li>broom <a href="https://broom.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.5)</a></li> <li>desirability2 <a href="https://desirability2.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.0.1)</a></li> <li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.1)</a></li> <li>probably <a href="https://probably.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>spatialsample <a href="https://spatialsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.4.0)</a></li> <li>tidymodels <a href="https://tidymodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: a new package with data for modeling, nearest neighbor distance matching cross-validation for spatial data, and a website refresh.</p> <p>First, loading the collection of packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="modeldatatoo">modeldatatoo <a href="#modeldatatoo"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many of the datasets used in tidymodels examples are available in the modeldata package. The new modeldatatoo package now extends the collection by several bigger datasets. To allow for the bigger size, the package does not contain those datasets directly but rather provides functions to access them, prefixed with <code>data_</code>. For example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/modeldatatoo'>modeldatatoo</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://tidymodels.github.io/modeldatatoo/reference/data_animals.html'>data_animals</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 610 × 48</span></span></span> <span><span class='c'>#&gt; text colour lifespan weight kingdom class phylum diet conservation_status</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #555555;'>"</span>Aardv… Brown… 23 years 60kg … Animal… Mamm… Chord… Omni… Least Concern </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #555555;'>"</span>Abyss… Fawn,… <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #555555;'>"</span>Adeli… Black… 10 - 20… 3kg -… Animal… Aves Chord… Carn… Least Concern </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #555555;'>"</span>Affen… Black… <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #555555;'>"</span>Afgha… Black… <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #555555;'>"</span>Afric… Grey,… 60 - 70… 3,600… Animal… Mamm… Chord… Herb… Threatened </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #555555;'>"</span>Afric… Black… 15 - 20… 1.4kg… Animal… Mamm… Chord… Omni… Least Concern </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #555555;'>"</span>Afric… Brown… 8 - 15 … 25g -… Animal… Amph… Chord… Carn… Least Concern </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #555555;'>"</span>Afric… Grey,… 60 - 70… 900kg… Animal… Mamm… Chord… Herb… Endangered </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #555555;'>"</span>Afric… Black… 15 - 20… 1.4kg… Animal… Mamm… Chord… Omni… Least Concern </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 600 more rows</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 39 more variables: order &lt;chr&gt;, scientific_name &lt;chr&gt;, skin_type &lt;chr&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># habitat &lt;chr&gt;, predators &lt;chr&gt;, family &lt;chr&gt;, lifestyle &lt;chr&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># average_litter_size &lt;chr&gt;, genus &lt;chr&gt;, top_speed &lt;chr&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># favourite_food &lt;chr&gt;, main_prey &lt;chr&gt;, type &lt;chr&gt;, common_name &lt;chr&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># group &lt;chr&gt;, size &lt;chr&gt;, distinctive_features &lt;chr&gt;, size_l &lt;chr&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># origin &lt;chr&gt;, special_features &lt;chr&gt;, location &lt;chr&gt;, …</span></span></span> <span></span></code></pre> </div> <p>The new datasets are:</p> <ul> <li> <a href="https://tidymodels.github.io/modeldatatoo/reference/data_animals.html" target="_blank" rel="noopener"><code>data_animals()</code></a> contains a long-form description of the animal (in the <code>text</code> column) as well as quite a bit of missing data and malformed fields.</li> <li> <a href="https://tidymodels.github.io/modeldatatoo/reference/data_chimiometrie_2019.html" target="_blank" rel="noopener"><code>data_chimiometrie_2019()</code></a> contains spectra measured at 550 (unknown) wavelengths, published as the challenge at the Chimiometrie 2019 conference.</li> <li> <a href="https://tidymodels.github.io/modeldatatoo/reference/data_elevators.html" target="_blank" rel="noopener"><code>data_elevators()</code></a> contains information on a subset of the elevators in New York City.</li> </ul> <p>Because those datasets are stored online, accessing them requires an active internet connection. We plan on using those datasets mostly for workshops and websites. The datasets in the modeldata package are part of the package directly, so they can be used everywhere (regardless of an active internet connection). We typically use them for package documentation.</p> <h2 id="spatialsample">spatialsample <a href="#spatialsample"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>spatialsample is a package for spatial resampling, extending the rsample framework to help create spatial extrapolation between your analysis and assessment data sets.</p> <p>The latest release of spatialsample includes nearest neighbor distance matching (NNDM) cross-validation via <a href="https://spatialsample.tidymodels.org/reference/spatial_nndm_cv.html" target="_blank" rel="noopener"><code>spatial_nndm_cv()</code></a>. NNDM is a variant of leave-one-out cross-validation which assigns each observation to a single assessment fold, and then attempts to remove data from each analysis fold until the nearest neighbor distance distribution between assessment and analysis folds matches the nearest neighbor distance distribution between training data and the locations a model will be used to predict. <a href="https://doi.org/10.1111/2041-210X.13851" target="_blank" rel="noopener">Proposed by Milà et al. (2022)</a>, this method aims to provide accurate estimates of how well models will perform in the locations they will actually be predicting. This method was originally implemented in the CAST package and can now be used with spatialsample as well.</p> <p>Let&rsquo;s use the Ames housing data and turn it from a regular tibble into a <code>sf</code> object for spatial data.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/spatialsample'>spatialsample</a></span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>ames</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>ames_sf</span> <span class='o'>&lt;-</span> <span class='nf'>sf</span><span class='nf'>::</span><span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_as_sf.html'>st_as_sf</a></span><span class='o'>(</span><span class='nv'>ames</span>, coords <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Longitude"</span>, <span class='s'>"Latitude"</span><span class='o'>)</span>, crs <span class='o'>=</span> <span class='m'>4326</span><span class='o'>)</span></span></code></pre> </div> <p>Let&rsquo;s assume that we are building a model to predict observations similar to this subset of the data:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ames_prediction_sites</span> <span class='o'>&lt;-</span> <span class='nv'>ames_sf</span><span class='o'>[</span><span class='m'>2001</span><span class='o'>:</span><span class='m'>2100</span>, <span class='o'>]</span></span></code></pre> </div> <p>Let&rsquo;s create NNDM cross-validation folds from a reduced training set as an example, just to keep things light.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ames_folds</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_nndm_cv.html'>spatial_nndm_cv</a></span><span class='o'>(</span><span class='nv'>ames_sf</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>100</span>, <span class='o'>]</span>, <span class='nv'>ames_prediction_sites</span><span class='o'>)</span></span></code></pre> </div> <p>The resulting <code>rset</code> contains 100 splits of the data, always keeping 1 of the 100 data points in the assessment set.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ames_folds</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 100 × 2</span></span></span> <span><span class='c'>#&gt; splits id </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold001</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #555555;'>&lt;split [83/1]&gt;</span> Fold002</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold003</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold004</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold005</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold006</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #555555;'>&lt;split [50/1]&gt;</span> Fold007</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #555555;'>&lt;split [76/1]&gt;</span> Fold008</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #555555;'>&lt;split [86/1]&gt;</span> Fold009</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #555555;'>&lt;split [88/1]&gt;</span> Fold010</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 90 more rows</span></span></span> <span></span></code></pre> </div> <p>Starting with all other 99 points in the analysis set, points are excluded until the distribution of nearest neighbor distances from the analysis set to the assessment set matches that of nearest neighbor distances from the training set to the prediction sites.</p> <p>Looking at one of the splits, we can see the single assessment point, the points included in the analysis set, and the points excluded as the buffer.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>get_rsplit</span><span class='o'>(</span><span class='nv'>ames_folds</span>, <span class='m'>3</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The <code>ames_fold</code> object can then be used with functions from the tune package as usual.</p> <h2 id="tidymodelsorg">tidymodels.org <a href="#tidymodelsorg"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The tidymodels website, <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels.org</a>, has been updated to use <a href="https://quarto.org/" target="_blank" rel="noopener">Quarto</a>. Things largely look the same as before but this change simplifies the build system which should make it easier for more people to contribute.</p> <p>This change to Quarto has also allowed us to improve the search functionality of the website. The tables for finding parsnip models, recipe steps, and broom tidiers at <a href="https://www.tidymodels.org/find/">https://www.tidymodels.org/find/</a> now all list objects across all CRAN packages, not just tidymodels packages. This should make it much easier to find the right extension for your task, even if not implemented within tidymodels!</p> <p>And if it does not exist yet, open an issue on GitHub or browse the <a href="https://www.tidymodels.org/learn/#category=developer%20tools" target="_blank" rel="noopener">developer documentation for extending tidymodels</a>!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to extend our thanks to all of the contributors to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>agua: <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>.</li> <li>broom: <a href="https://github.com/awcm0n" target="_blank" rel="noopener">@awcm0n</a>, <a href="https://github.com/gregmacfarlane" target="_blank" rel="noopener">@gregmacfarlane</a>, <a href="https://github.com/jwilliman" target="_blank" rel="noopener">@jwilliman</a>, <a href="https://github.com/mccarthy-m-g" target="_blank" rel="noopener">@mccarthy-m-g</a>, <a href="https://github.com/RoyalTS" target="_blank" rel="noopener">@RoyalTS</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/ste-tuf" target="_blank" rel="noopener">@ste-tuf</a>.</li> <li>desirability2: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/naveranoc" target="_blank" rel="noopener">@naveranoc</a>.</li> <li>probably: <a href="https://github.com/agormp" target="_blank" rel="noopener">@agormp</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>spatialsample: <a href="https://github.com/jamesgrecian" target="_blank" rel="noopener">@jamesgrecian</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>.</li> <li>tidymodels: <a href="https://github.com/forecastingEDs" target="_blank" rel="noopener">@forecastingEDs</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> </ul> </div> R for Data Science, 2nd edition https://www.tidyverse.org/blog/2023/07/r4ds-2e/ Tue, 11 Jul 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/07/r4ds-2e/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the publication of the 2nd edition of <a href="https://r4ds.hadley.nz/" target="_blank" rel="noopener">R for Data Science</a>.</p> <p>The second edition is a major reworking of the first edition, removing material we no longer think is useful, adding material we wish we included in the first edition, and generally updating the text and code to reflect changes in best practices.</p> <p>You can read the book online for free at <a href="https://r4ds.hadley.nz/" class="uri"><a href="https://r4ds.hadley.nz">https://r4ds.hadley.nz</a></a>, or <a href="https://www.amazon.com/dp/1492097403?&amp;tag=hadlwick-20" target="_blank" rel="noopener">buy a physical copy</a>.</p> <p>Read below to find out what&rsquo;s new and what&rsquo;s gone compared to the first edition.</p> <h2 id="whats-new">What&rsquo;s new? <a href="#whats-new"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We have renamed the first part of the book to <a href="https://r4ds.hadley.nz/whole-game.html" target="_blank" rel="noopener">&ldquo;Whole game&rdquo;</a>, with the goal of giving you the rough details of the &quot;whole game&quot; of data science, including data visualization, transformation, tidying, and import, before we dive into the details. The data visualization chapter has gained a new section written with the <a href="https://datasciencebox.org/01-design-principles.html" target="_blank" rel="noopener">&ldquo;cake first&rdquo;</a> approach, which starts with the final visualization you will learn to make, and then builds up to it layer-by-layer. The data tidying chapter introduces the basics of lengthening and widening data and the data import chapter introduces reading tabular data.</p> <p>The second part of the book is <a href="https://r4ds.hadley.nz/visualize.html" target="_blank" rel="noopener">&quot;Visualize&quot;</a>, which gives data visualization tools and best practices a more thorough coverage compared to the first edition.</p> <p>The third part of the book is now called <a href="https://r4ds.hadley.nz/transform.html" target="_blank" rel="noopener">&quot;Transform&quot;</a>and gains new chapters on numbers, logical vectors, and missing values. Much of this content was previously part of the data transformation chapter. In this edition we have expanded them to cover all the details.</p> <p>The fourth part of the book is called <a href="https://r4ds.hadley.nz/import.html" target="_blank" rel="noopener">&quot;Import&quot;</a>, it's a new set of chapters that goes beyond reading flat text files to working with spreadsheets (Excel and GoogleSheets), databases, and big data (with Arrow) as well as rectangling hierarchical data and scraping data from web sites.</p> <p>The <a href="https://r4ds.hadley.nz/program.html" target="_blank" rel="noopener">&quot;Program&quot;</a> part has been rewritten from scratch to focus on the most important parts of function writing and iteration. Function writing now includes details on how to wrap tidyverse functions (dealing with the challenges of tidy evaluation), since this has become much easier and more important over the last few years. We have also added a new chapter on important base R functions that you're likely to see in wild-caught R code.</p> <p>Finally, the <a href="https://r4ds.hadley.nz/communicate.html" target="_blank" rel="noopener">&quot;Communicate&quot;</a> part remains, but has been thoroughly updated to feature <a href="https://quarto.org/" target="_blank" rel="noopener">Quarto</a> instead of R Markdown. This edition of the book has been written in Quarto, and it's clearly the tool of the future.</p> <h2 id="whats-gone">What&rsquo;s gone? <a href="#whats-gone"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first edition of the book featured a part on modeling, which has now been removed. We never had enough room to fully do modelling justice, and there are now much better resources available. We generally recommend using the <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> packages and reading <a href="https://www.tmwr.org/" target="_blank" rel="noopener">Tidy Modeling with R</a> by Max Kuhn and Julia Silge.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This book isn't just the product of Hadley, Mine, and Garrett, but is the result of many conversations (in person and online) that we've had with many people in the R community. Huge thanks to <a href="https://r4ds.hadley.nz/intro.html#acknowledgments" target="_blank" rel="noopener">all contributors</a> for the conversations, issues, and pull requests. And, as always, feedback and suggestions are welcome on the <a href="https://github.com/hadley/r4ds/" target="_blank" rel="noopener">book repository</a>.</p> gmailr 2.0.0 https://www.tidyverse.org/blog/2023/06/gmailr-2-0-0/ Thu, 29 Jun 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/06/gmailr-2-0-0/ <p>We&rsquo;re chuffed to announce the release of <a href="https://gmailr.r-lib.org/" target="_blank" rel="noopener">gmailr</a> 2.0.0. gmailr exposes the <a href="https://developers.google.com/gmail/api/guides" target="_blank" rel="noopener">Gmail API</a> from R.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"gmailr"</span><span class='o'>)</span></span></code></pre> </div> <p>The main goal of version 2.0.0 is to improve the ergonomics around auth. There is less need for fussy code around configuring an OAuth client and it&rsquo;s easier to use gmailr in a non-interactive or deployed setting. There is also a major advance in the process of replacing legacy functions with versions that have a <code>gm_</code> prefix. The legacy functions still exist, but are now hard deprecated. Finally, gmailr no longer re-exports <code>%&gt;%</code>, the magrittr pipe, now that we have <code>|&gt;</code> in base R.</p> <p>You can see a full list of changes in the <a href="https://github.com/r-lib/gmailr/releases/tag/v2.0.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://gmailr.r-lib.org'>gmailr</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Attaching package: 'gmailr'</span></span> <span></span><span><span class='c'>#&gt; The following object is masked from 'package:utils':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; history</span></span> <span></span><span><span class='c'>#&gt; The following objects are masked from 'package:base':</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; body, date, labels, message</span></span> <span></span></code></pre> </div> <p>😬 <em>Ouch! These name collisions are <strong>exactly</strong> why gmailr added a universal <code>gm_</code> prefix to all of its functions starting in v1.0.0. One day in the not-too-distant future we can remove the troublesome legacy functions.</em></p> <h2 id="oauth-client">OAuth client <a href="#oauth-client"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The Gmail API is more challenging to wrap, in terms of auth, than the APIs for Sheets, Drive, or BigQuery. That&rsquo;s because the scopes (think: &ldquo;permissions&rdquo;) needed for the Gmail API are <a href="https://developers.google.com/gmail/api/auth/scopes#scopes" target="_blank" rel="noopener">regarded as extremely sensitive</a>, as well they should be. If a bad actor gains the ability to read and send email as you, that is considerably more damaging than them being able to modify your spreadsheets (which is also bad, to be sure, but considerably less bad). Email is particularly important, because most other services allow you to reset your password via email; if someone gets access to your email, they can quickly use that to access every other service you have a log in for.</p> <p>The heightened security around the Gmail API means that a wrapper package like gmailr can&rsquo;t make auth &ldquo;just work&rdquo; as easily we can in other packages, such as googledrive. In particular, R users who want to use gmailr absolutely must provide their own <em>OAuth client</em>. In other packages, we can make this optional.</p> <p>gmailr v2.0.0 includes new features and documentation to reduce the pain around the OAuth client as much as possible:</p> <ul> <li> <p> <a href="https://gmailr.r-lib.org/articles/oauth-client.html" target="_blank" rel="noopener">Set up an OAuth client</a> is a new article with detailed instructions for creating and configuring an OAuth client. You might even say this provides an excruciating level of detail, but this process has proven to be tricky for many users.</p> </li> <li> <p>There is now a default location for the JSON file that represents the OAuth client. It&rsquo;s the location returned by <code>rappdirs::user_data_dir(&quot;gmailr&quot;)</code>. If you put the JSON file there, gmailr will find it automagically.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>rappdirs</span><span class='nf'>::</span><span class='nf'><a href='https://rappdirs.r-lib.org/reference/user_data_dir.html'>user_data_dir</a></span><span class='o'>(</span><span class='s'>"gmailr"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/list.files.html'>list.files</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "client_secret_xxx-yyy.apps.googleusercontent.com.json"</span></span> <span></span> <span><span class='nf'><a href='https://gmailr.r-lib.org/reference/gmailr-configuration.html'>gm_default_oauth_client</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "/Users/jenny/Library/Application Support/gmailr/client_secret_xxx-yyy.apps.googleusercontent.com.json"</span></span></code></pre> </div> <p> <a href="https://gmailr.r-lib.org/reference/gmailr-configuration.html" target="_blank" rel="noopener"><code>gm_default_oauth_client()</code></a> is the new function that implements this new feature as well as pre-existing support for providing this path via an environment variable.</p> </li> <li> <p>If the OAuth client is configured for auto-discovery, it is no longer necessary to call <a href="https://gmailr.r-lib.org/reference/gm_auth_configure.html" target="_blank" rel="noopener"><code>gm_auth_configure()</code></a> explicitly. That is taken care of internally, inside <a href="https://gmailr.r-lib.org/reference/gm_auth.html" target="_blank" rel="noopener"><code>gm_auth()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'>> library(gmailr) > > # 😲 OMG no more need to call gm_auth_configure() here! 🎉 > > gm_threads() The gmailr package is requesting access to your Google account. Enter '1' to start a new auth process or select a pre-authorized account. 1: Send me to the browser for a new auth process. 2: [email protected] Selection: 2 </code></pre> </div> </li> <li> <p>Conversely, if you happen to be providing an explicit user or service account token, <code>gm_auth(token =)</code> and <code>gm_auth(path =, subject =)</code><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> no longer error if the OAuth client is not configured.</p> </li> </ul> <h2 id="auth-in-a-deployed-or-other-non-interactive-setting">Auth in a deployed or other non-interactive setting <a href="#auth-in-a-deployed-or-other-non-interactive-setting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The Gmail API is primarily intended for use on behalf of a regular Google user account. The gmailr package is designed to guide an interactive R user through a process in which they authenticate themselves to Google and authorize Gmail activities initiated from R. This is sometimes referred to as the &ldquo;OAuth dance&rdquo;.<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup></p> <p>But what about settings where there is no interactive user sitting around to do this dance, i.e. when gmailr-using code is deployed to a remote server or otherwise runs unattended? For most Google APIs, the standard advice is &ldquo;use a service account&rdquo;. But the Gmail API is special. To use a service account with the Gmail API basically requires that the service account has been delegated domain-wide authority. This is tricky for at least two reasons. First, this is only possible within a Google Workspace, i.e. it&rsquo;s not available to personal Google accounts. Second, most Google Workspace admins will refuse to do this, for security reasons.</p> <p>Therefore, if you want to deploy a data product that uses gmailr, it&rsquo;s extremely likely that you really do need to use a user token. This workflow has gotten dramatically easier in gmailr v2.0.0:</p> <ul> <li> <a href="https://gmailr.r-lib.org/articles/deploy-a-token.html" target="_blank" rel="noopener">Deploy a token</a> is a new article describing how to capture a token interactively, then use it later, non-interactively.</li> <li> <a href="https://gmailr.r-lib.org/reference/gm_token_write.html" target="_blank" rel="noopener"><code>gm_token_write()</code></a> + <a href="https://gmailr.r-lib.org/reference/gm_token_write.html" target="_blank" rel="noopener"><code>gm_token_read()</code></a> is a new matched pair of functions that facilitate writing an obfuscated token to disk then reloading that token in a deployed data product or in CI.</li> <li>gmailr ships with <a href="https://github.com/r-lib/gmailr/tree/main/inst/deployed-token-demo" target="_blank" rel="noopener">example code</a> that uses this technique in a small Shiny app that sends email from a specific user account. See the contents of <code>system.file(&quot;deployed-token-demo&quot;, package = &quot;gmailr&quot;)</code>.</li> </ul> <p>The heart of this approach is to first capture a token in an interactive session:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_auth.html'>gm_auth</a></span><span class='o'>(</span><span class='s'>"[email protected]"</span>, cache <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='c'># interactive OAuth dance in the browser happens HERE</span></span> <span><span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_token_write.html'>gm_token_write</a></span><span class='o'>(</span></span> <span> path <span class='o'>=</span> <span class='s'>".secrets/gmailr-token.rds"</span>,</span> <span> key <span class='o'>=</span> <span class='s'>"SUPER_SECRET_ENCRYPTION_KEY"</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p>then reload it in a subsequent non-interactive session:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_auth.html'>gm_auth</a></span><span class='o'>(</span>token <span class='o'>=</span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_token_write.html'>gm_token_read</a></span><span class='o'>(</span></span> <span> <span class='s'>".secrets/gmailr-token.rds"</span>,</span> <span> key <span class='o'>=</span> <span class='s'>"SUPER_SECRET_ENCRYPTION_KEY"</span></span> <span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <h2 id="progress-on-the-great-renaming">Progress on The Great Renaming <a href="#progress-on-the-great-renaming"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Unfortunately, there is considerable overlap between some obvious function names in an email-related package (e.g. &ldquo;body&rdquo;, &ldquo;date&rdquo;, &ldquo;message&rdquo;) and pre-existing functions in base R (e.g.  <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>body()</code></a>, <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>date()</code></a>, <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>message()</code></a>). From very early on, gmailr exported several functions with regrettable name collisions, as evidenced at the beginning of this post when we called <a href="https://gmailr.r-lib.org" target="_blank" rel="noopener"><code>library(gmailr)</code></a>.</p> <p>In version 1.0.0 (released 2019-08-30), the process of addressing this problem kicked off. At that time, gmailr adopted a universal <code>gm_</code> prefix for its functions and soft deprecated the legacy functions. Here&rsquo;s an indicative sample of the function replacements:</p> <ul> <li> <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>body()</code></a> ➡️ <a href="https://gmailr.r-lib.org/reference/gm_body.html" target="_blank" rel="noopener"><code>gm_body()</code></a></li> <li> <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>date()</code></a> ➡️ <a href="https://gmailr.r-lib.org/reference/accessors.html" target="_blank" rel="noopener"><code>gm_date()</code></a></li> <li> <a href="https://gmailr.r-lib.org/reference/gmailr-deprecated.html" target="_blank" rel="noopener"><code>message()</code></a> ➡️ <a href="https://gmailr.r-lib.org/reference/gm_message.html" target="_blank" rel="noopener"><code>gm_message()</code></a></li> </ul> <p>In version 2.0.0, the legacy functions are hard deprecated and you should expect them to be removed in the next release of gmailr. I don&rsquo;t expect there to be much (any?) surviving usage of these functions, but it&rsquo;s definitely time to eliminate any remaining usage.</p> <h2 id="use-the-native-pipe">Use the native pipe <a href="#use-the-native-pipe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>gmailr is designed to be very pipe friendly and it leads to very natural code that builds up a message from its parts:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>msg</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_mime.html'>gm_mime</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/accessors.html'>gm_to</a></span><span class='o'>(</span><span class='s'>"[email protected]"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/accessors.html'>gm_from</a></span><span class='o'>(</span><span class='s'>"[email protected]"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/accessors.html'>gm_subject</a></span><span class='o'>(</span><span class='s'>"Hello, world!"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://gmailr.r-lib.org/reference/gm_mime.html'>gm_text_body</a></span><span class='o'>(</span><span class='s'>"I come in peace."</span><span class='o'>)</span></span></code></pre> </div> <p>gmailr predates the introduction of the native pipe, in R 4.1, and therefore, historically, it has re-exported <code>%&gt;%</code>, the magrittr pipe, for user convenience. The magrittr pipe also featured heavily in gmailr&rsquo;s documentation.</p> <p>In the v2.0.0 release, I&rsquo;ve removed the magrittr dependency and now use the native pipe operator <code>|&gt;</code> in all documentation (gmailr never used the pipe internally). The purrr package pioneered this maneuver, within the tidyverse, and gmailr uses the same techniques to resolve the tension between the new usage of the base pipe and the tidyverse policy of supporting older R versions. You can learn more about the pipe transition in the blog post <a href="https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/" target="_blank" rel="noopener">Differences between the base R and magrittr pipes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thank you to all those who have contributed to gmailr since the v1.0.0 release:</p> <p> <a href="https://github.com/absuag" target="_blank" rel="noopener">@absuag</a>, <a href="https://github.com/aeburger" target="_blank" rel="noopener">@aeburger</a>, <a href="https://github.com/andresxmv" target="_blank" rel="noopener">@andresxmv</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/beib" target="_blank" rel="noopener">@beib</a>, <a href="https://github.com/careercoachme" target="_blank" rel="noopener">@careercoachme</a>, <a href="https://github.com/chuagh74" target="_blank" rel="noopener">@chuagh74</a>, <a href="https://github.com/cstangor" target="_blank" rel="noopener">@cstangor</a>, <a href="https://github.com/EeethB" target="_blank" rel="noopener">@EeethB</a>, <a href="https://github.com/enricodata" target="_blank" rel="noopener">@enricodata</a>, <a href="https://github.com/FJCC" target="_blank" rel="noopener">@FJCC</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/HadyShaaban" target="_blank" rel="noopener">@HadyShaaban</a>, <a href="https://github.com/ismayc" target="_blank" rel="noopener">@ismayc</a>, <a href="https://github.com/j450h1" target="_blank" rel="noopener">@j450h1</a>, <a href="https://github.com/janebunr" target="_blank" rel="noopener">@janebunr</a>, <a href="https://github.com/jcheng5" target="_blank" rel="noopener">@jcheng5</a>, <a href="https://github.com/JeffreyCHoover" target="_blank" rel="noopener">@JeffreyCHoover</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/jonmichael-caldwell" target="_blank" rel="noopener">@jonmichael-caldwell</a>, <a href="https://github.com/jreid88" target="_blank" rel="noopener">@jreid88</a>, <a href="https://github.com/Karlheinzniebuhr" target="_blank" rel="noopener">@Karlheinzniebuhr</a>, <a href="https://github.com/kputschko" target="_blank" rel="noopener">@kputschko</a>, <a href="https://github.com/KryeKuzhinieri" target="_blank" rel="noopener">@KryeKuzhinieri</a>, <a href="https://github.com/laurenmarietta" target="_blank" rel="noopener">@laurenmarietta</a>, <a href="https://github.com/lwjohnst86" target="_blank" rel="noopener">@lwjohnst86</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/majazaloznik" target="_blank" rel="noopener">@majazaloznik</a>, <a href="https://github.com/maticabgd" target="_blank" rel="noopener">@maticabgd</a>, <a href="https://github.com/MCOtto" target="_blank" rel="noopener">@MCOtto</a>, <a href="https://github.com/meheszlev" target="_blank" rel="noopener">@meheszlev</a>, <a href="https://github.com/Mr-Hadoop-Hotshot" target="_blank" rel="noopener">@Mr-Hadoop-Hotshot</a>, <a href="https://github.com/Niekuba" target="_blank" rel="noopener">@Niekuba</a>, <a href="https://github.com/norcalbiostat" target="_blank" rel="noopener">@norcalbiostat</a>, <a href="https://github.com/Patrikios" target="_blank" rel="noopener">@Patrikios</a>, <a href="https://github.com/pschloss" target="_blank" rel="noopener">@pschloss</a>, <a href="https://github.com/pythiantech" target="_blank" rel="noopener">@pythiantech</a>, <a href="https://github.com/randy3k" target="_blank" rel="noopener">@randy3k</a>, <a href="https://github.com/ratnexa" target="_blank" rel="noopener">@ratnexa</a>, <a href="https://github.com/sanjmeh" target="_blank" rel="noopener">@sanjmeh</a>, <a href="https://github.com/sdisav" target="_blank" rel="noopener">@sdisav</a>, <a href="https://github.com/sommerhd-royals" target="_blank" rel="noopener">@sommerhd-royals</a>, <a href="https://github.com/statnmap" target="_blank" rel="noopener">@statnmap</a>, <a href="https://github.com/tariuk" target="_blank" rel="noopener">@tariuk</a>, <a href="https://github.com/tvroylandt" target="_blank" rel="noopener">@tvroylandt</a>, <a href="https://github.com/vinaybugz" target="_blank" rel="noopener">@vinaybugz</a>, and <a href="https://github.com/VincentGuyader" target="_blank" rel="noopener">@VincentGuyader</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>The <code>subject</code> argument of <a href="https://gmailr.r-lib.org/reference/gm_auth.html" target="_blank" rel="noopener"><code>gm_auth()</code></a> is also new and facilitates the use of a service account to impersonate a user. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>The full OAuth dance is not necessary in subsequent R sessions, though, by default gmailr is very conservative and asks for permission to use and refresh an existing token. This is, of course, configurable. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Package spring cleaning https://www.tidyverse.org/blog/2023/06/spring-cleaning-2023/ Wed, 07 Jun 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/06/spring-cleaning-2023/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] `Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <style> p.caption { color:#696969; font-style: italic; } </style> <p>When Spring arrives in the Northern hemisphere, the sun&rsquo;s rays reach into the dark corners and illuminate the dust that has been gathering over the winter. This is when our thoughts start to turn to Spring cleaning &mdash; a time to clear out the clutter that has accumulated over the past year. It represents a fresh start and a new beginning, and leaves us feeling rejuvenated and ready to take on the rest of the year. This applies not only to our homes, but also to the code that we maintain &mdash; there are often bits and pieces that we know need attention but never seem to make it to the top of the priority list.</p> <p>Doing this kind of work isn&rsquo;t necessarily only about adopting good practices or increasing the quality of your code &mdash; it can also be about adding value through standardization. Most developers only work sporadically on a particular package. For some it&rsquo;s because they work on a lot of packages, while for many it&rsquo;s because package development is not their main job. When you return to a package after a long gap, there is potential for a lot of friction (and dread/procrastination) as you get re-oriented to its idiosyncrasies. Making the occasional pass through your packages and looking for opportunities to adopt current, shared practices can make it easier to dip in and out of different packages.</p> <p>The tidyverse team at Posit has a practice of tackling Spring Cleaning together - we set aside a week every year to work in a semi-structured way to efficiently take care of a common list of package maintenance tasks. We find that setting a time for them and doing them all together during one week is an effective, and more fun, way to get them done. We recently completed our 2023 Spring Cleaning and thought it might be fun to share our process.</p> <p>I&rsquo;ll also show off a new feature we&rsquo;ve built in to the <a href="https://usethis.r-lib.org/news/index.html#usethis-220" target="_blank" rel="noopener">latest version of usethis</a> that will help you organize your own Spring Cleaning. Feel free to <a href="#spring-cleaning-and-you">jump straight there</a> if you want to skip the back story (don&rsquo;t you wish recipe blogs had this feature?).</p> <h2 id="preparation">Preparation <a href="#preparation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Early in the new year, we set aside the time in our calendars for Spring Cleaning &mdash; this way everyone knows that it&rsquo;s coming up and can make sure they have cleared the space in their schedules (and their minds) to focus on it.</p> <p>We prepare for the week by creating a list of things we want to take care of in our packages. Rather than adding features or fixing bugs, these tasks are usually about bringing things up to current standards or best practices, and include things like updating tests to the latest testthat version, updating pkgdown templates, and adding alt-text to images in pkgdown sites. Not surprisingly, this year a lot of the upkeep was related to our <a href="https://posit.co/blog/rstudio-is-now-posit/" target="_blank" rel="noopener">recent rebrand from RStudio to Posit</a> &mdash; things like updating the copyright holder and author email addresses, and using updated logos without the old rstudio.com website on them.</p> <p>We start off the week with a kickoff meeting on Monday morning. We go through the checklist with everybody and refine what&rsquo;s in it, making sure everybody has had input. Because we maintain so many packages, we have a spreadsheet where we keep track of the packages that are undergoing spring cleaning, and people can assign themselves to packages and mark them as completed when they&rsquo;re done.</p> <h2 id="checklists-checklists-checklists">Checklists, checklists, checklists <a href="#checklists-checklists-checklists"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We formalize these tasks into a checklist ( <a href="https://atulgawande.com/book/the-checklist-manifesto/" target="_blank" rel="noopener">who doesn&rsquo;t love checklists</a>) via a tidyverse-focused function in usethis called <code>use_tidy_upkeep_issue()</code>. If you&rsquo;re a package developer and you use <a href="https://usethis.r-lib.org/reference/use_release_issue.html" target="_blank" rel="noopener"><code>use_release_issue()</code></a>, this will look familiar: it opens an issue in the package&rsquo;s GitHub repository with a checklist of tasks to guide us through what needs to be done to bring it up to current tidyverse standards. We update the function with the current year&rsquo;s checklist just prior to starting (and sometimes during) Spring Cleaning.</p> <p>Package maintainers then install the development version of usethis to get the current checklist, and call <code>usethis::use_tidy_upkeep_issue()</code> in their package to create the issue. If there are any tasks that aren&rsquo;t relevant to that particular repo it&rsquo;s easy to just edit the issue and remove it. To be really meta, here is the 2023 Spring Cleaning <a href="https://github.com/r-lib/usethis/issues/1791" target="_blank" rel="noopener">upkeep issue for usethis</a>, created by usethis:</p> <div class="highlight"> <div class="figure" style="text-align: center"> <p><a href="https://github.com/r-lib/usethis/issues/1791" target="_blank"><img src="img/usethis-upkeep-issue.png" alt="2023 Upkeep Issue for usethis" width="700px" /></a></p> <p class="caption"> 2023 Upkeep Issue for usethis </p> </div> </div> <p>We separated the tasks into &ldquo;Necessary&rdquo; and &ldquo;Optional&rdquo;. The necessary tasks were those we needed to complete for all of our packages, and also were simple enough that we could be sure we would able to complete them. The optional items were those that were nice to have, and/or would take longer to complete. We try to complete the work, including reviewing and merging any related <a href="https://github.com/tidymodels/dials/pull/275" target="_blank" rel="noopener">pull requests</a>, all within the week, with the intention of closing the upkeep issue by Friday.</p> <h2 id="wrapup">Wrapup <a href="#wrapup"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Finally, we end the week with a wrap-up meeting - we do a retrospective on what worked, what didn&rsquo;t, and what we would change for next time. For example, we found that a couple of items on this year&rsquo;s checklist that were too complex to complete within the week, especially across many repos. So we decided to start a practice of converting those &ldquo;too big&rdquo; tasks into issues of their own &mdash; you can see an example in the <a href="https://github.com/r-lib/testthat/issues/1749" target="_blank" rel="noopener">testthat upkeep issue</a>. This makes it more likely that we can cleanly complete the checklist but still flag those lingering things we would like to finish.</p> <p>We also try to have a little fun during the wrap-up meeting! I made a small R package called <a href="https://github.com/ateucher/chatrbox" target="_blank" rel="noopener">chatrbox</a> that uses <a href="https://openai.com/blog/chatgpt" target="_blank" rel="noopener">ChatGPT</a> to generate R-themed Spring Cleaning text snippets. And Tracy Teal used <a href="https://quarto.org/" target="_blank" rel="noopener">quarto</a> to make certificates of achievement for each of us, complete with inspirational messages made with chatrbox!</p> <div class="highlight"> <p><img src="img/george-certificate.png" alt="A certificate of excellence in Spring Cleaning for George Stagg, with AI-generated text in the form of a tweet about software licensing in the style of Shakespeare. The generated text says: &quot;Of software fair, be wary and take heed, For licensing terms doth often mislead. Choose wisely, lest thou shouldst freely bruise.&quot; #SoftwareLicensing #ShakespeareanTweets" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="spring-cleaning-and-you">Spring cleaning and you! <a href="#spring-cleaning-and-you"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In <a href="https://usethis.r-lib.org/news/index.html#usethis-220" target="_blank" rel="noopener">version 2.2.0 of usethis</a>, we have created a general purpose <a href="https://usethis.r-lib.org/reference/use_upkeep_issue.html" target="_blank" rel="noopener"><code>use_upkeep_issue()</code></a> function for package authors to use if they wish to do a Spring Cleaning of their own. It is a fairly opinionated list of tasks but we believe taking care of them will generally make your package better, easier to maintain, and more enjoyable for your users. Some of the tasks are meant to be performed only once (and once completed shouldn&rsquo;t show up in subsequent lists), and some should be reviewed periodically. If you want to include additional tasks, you can add an (unexported) function named <code>upkeep_bullets()</code> to your own package that returns a character vector of tasks. These will be added to your upkeep checklist.</p> <p>Here is an example of an upkeep issue I created for my package rmapshaper. I created an internal function <code>upkeep_bullets()</code> in the package, with an extra bullet I wanted to add to the upkeep issue:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>upkeep_bullets</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='s'>"Update bundled mapshaper node library."</span></span></code></pre> </div> <p>And then called <code>use_upkeep_issue()</code> in my rmapshaper package directory:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>devtools</span><span class='nf'>::</span><span class='nf'><a href='https://devtools.r-lib.org/reference/load_all.html'>load_all</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; ℹ Loading rmapshaper</span></span> <span><span class='nf'>usethis</span><span class='nf'>::</span><span class='nf'><a href='https://usethis.r-lib.org/reference/use_upkeep_issue.html'>use_upkeep_issue</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; ✔ Setting active project to '/Users/andyteucher/dev/ateucher/rmapshaper'</span></span> <span><span class='c'>#&gt; • Open URL 'https://github.com/ateucher/rmapshaper/issues/160'</span></span></code></pre> </div> <div class="highlight"> <div class="figure" style="text-align: center"> <p><a href="https://github.com/ateucher/rmapshaper/issues/160" target="_blank"><img src="img/rmapshaper-upkeep-issue.png" alt="Upkeep issue for rmapshaper" width="700px" /></a></p> <p class="caption"> Upkeep issue for rmapshaper </p> </div> </div> <p>In a fun confluence of events, while working on this post I attended an <a href="https://ropensci.org/events/coworking-2023-05/" target="_blank" rel="noopener">rOpenSci coworking session</a> where the topic of the day was spring cleaning! We chatted about the benefits of regular upkeep, and what types of tasks make good spring cleaning issues. It was really inspiring and validating to connect with other people tackling maintenance like this.</p> <p>We hope that this will provide a starting point, and motivate you to take care of those nagging maintenance issues, whether it be in the Spring (whenever that is in your part of the world), or any other time of the year. We&rsquo;d love to hear if you find this helpful, or if there&rsquo;s a way that it could be better, please <a href="https://github.com/r-lib/usethis/issues" target="_blank" rel="noopener">let us know</a>.</p> Tidyteam code review principles https://www.tidyverse.org/blog/2023/06/code-review-principles/ Mon, 05 Jun 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/06/code-review-principles/ <p>At Posit, we strive to write high quality code to ensure that you, our users, have the best experience possible. We feel that the code review process plays a critical role in delivering quality products, and in developing the skills of newer contributors, and we decided to make that process explicit through a <a href="https://code-review.tidyverse.org/" target="_blank" rel="noopener">tidyteam code review principles</a> guide.</p> <p>At a high level, this guide walks you through the perspectives of both the pull request author and the pull request reviewer, discussing various aspects of the process from both points of view (such as how to <a href="https://code-review.tidyverse.org/author/handling-comments.html" target="_blank" rel="noopener">handle reviewer comments</a> and how to write <a href="https://code-review.tidyverse.org/author/focused.html" target="_blank" rel="noopener">focused pull requests</a>). Throughout the guide, we repeatedly tie back to three different <a href="https://code-review.tidyverse.org/collaboration/" target="_blank" rel="noopener">patterns of collaboration</a>, which reflect that each code review is unique and comes with its own set of expectations between the author and the reviewer.</p> <p>We posted about this guide on <a href="https://twitter.com/dvaughan32/status/1645866331487756288?s=20" target="_blank" rel="noopener">Twitter</a> and <a href="https://fosstodon.org/@davis/110181751636631782" target="_blank" rel="noopener">Mastodon</a> a few weeks ago:</p> <blockquote class="twitter-tweet"> <p lang="en" dir="ltr"> In the tidyverse, we work with a lot of people - each other and <a href="https://twitter.com/hashtag/rstats?src=hash&amp;ref_src=twsrc%5Etfw">#rstats</a> community members.<br><br>We wanted to document how we handle code review, so we\'ve drafted a guide detailing our review principles!<br><br>We hope you find it useful, and welcome your feedback!<a href="https://t.co/jGm0rSGg5M">https://t.co/jGm0rSGg5M</a> </p> --- Davis Vaughan (@dvaughan32) <a href="https://twitter.com/dvaughan32/status/1645866331487756288?ref_src=twsrc%5Etfw">April 11, 2023</a> </blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> <p>And we were happy to see that <a href="https://fosstodon.org/@jromanowska/110182021271601892" target="_blank" rel="noopener">many of you</a> are already finding it useful!</p> <iframe src="https://fosstodon.org/@jromanowska/110182021271601892/embed" class="mastodon-embed" style="max-width: 100%; border: 0" width="400" allowfullscreen="allowfullscreen"> </iframe> <script src="https://fosstodon.org/embed.js" async="async"></script> <p>In particular, I&rsquo;d like to shout out <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">Hiroaki Yutani</a> who created a <a href="https://www.youtube.com/watch?v=gSv6h2heHQE" target="_blank" rel="noopener">two part video series</a> reading through the principles in Japanese!</p> <p>Internally, we&rsquo;ve also been referencing this guide when reviewing pull requests from each other and from the community. For example, Jenny Bryan linked out to the section on <a href="https://code-review.tidyverse.org/author/submitting.html#sec-descriptions" target="_blank" rel="noopener">creating a good pull request description</a> when reviewing a <a href="https://github.com/r-dbi/bigrquery/pull/512#issuecomment-1511687647" target="_blank" rel="noopener">bigrquery PR</a>, and I internally linked a colleague to the section on <a href="https://code-review.tidyverse.org/reviewer/comments.html#github-suggestions" target="_blank" rel="noopener">GitHub Suggestions</a>, which discusses how to batch multiple suggestions into a single commit.</p> <p>We adapted these principles from Google&rsquo;s own <a href="https://google.github.io/eng-practices/review/" target="_blank" rel="noopener">guide</a>, and we encourage you to do the same thing with ours. If you work in a research lab or are on a software team at your company, then code review should be as important to you as it is to us! Feel free to modify these principles to suit your own needs, and if you do use them, we&rsquo;d love to hear about it.</p> `purrr::walk()` this way https://www.tidyverse.org/blog/2023/05/purrr-walk-this-way/ Fri, 26 May 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/05/purrr-walk-this-way/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <h2 id="meet-the-map-family">Meet the <code>map()</code> family <a href="#meet-the-map-family"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>purrr&rsquo;s <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> family of functions are tools for <strong>iteration</strong>, performing the same action on multiple inputs. If you&rsquo;re new to purrr, the <a href="https://r4ds.had.co.nz/iteration.html#iteration" target="_blank" rel="noopener">Iteration chapter</a> of R for Data Science is a good place to get started.</p> <p>One of the benefits of using <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> is that the function has variants (e.g.  <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2()</code></a>, <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap()</code></a>, etc.) all of which work the same way. To borrow from Jennifer Thompson&rsquo;s excellent <a href="https://github.com/jenniferthompson/RLadiesIntroToPurrr" target="_blank" rel="noopener">Intro to purrr</a>,the arguments can be broken into two groups: what we&rsquo;re iterating over, and what we&rsquo;re doing each time. The adapted figure below shows what this looks like for <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a>, <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2()</code></a>, and <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap()</code></a>.</p> <div class="highlight"> <div class="figure" style="text-align: center"> <p><img src="purrr-map-args.png" alt="Highlighted titles read: what we're iterating over, and what we're doing each time. For map(.x = , .f = ) .x is what we're iterating over and .f is what we're doing each time. For map2(.x = , .y = , .f = ) .x and .y are what we're iterating over and .f is what we're doing each time. For pmap(.l = list(), .f = ) .l is what we're iterating over and .f is what we're doing each time." width="700px" /></p> <p class="caption"> Grouped map function arguments, adapted from Intro to purrr by Jennifer Thompson </p> </div> </div> <p>In addition to handling different input arguments, the map family of functions has variants that create different outputs. The following table from the <a href="https://adv-r.hadley.nz/functionals.html#map-variants" target="_blank" rel="noopener">Map-variants section of Advanced R</a> shows how the orthogonal inputs and outputs can be used to organise the variants into a matrix:</p> <table> <thead> <tr> <th></th> <th>List</th> <th>Atomic</th> <th>Same type</th> <th>Nothing</th> </tr> </thead> <tbody> <tr> <td>One argument</td> <td> <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_lgl()</code></a>, &hellip;</td> <td> <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a></td> </tr> <tr> <td>Two arguments</td> <td> <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2_lgl()</code></a>, &hellip;</td> <td> <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify2()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>walk2()</code></a></td> </tr> <tr> <td>One argument + index</td> <td> <a href="https://purrr.tidyverse.org/reference/imap.html" target="_blank" rel="noopener"><code>imap()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/imap.html" target="_blank" rel="noopener"><code>imap_lgl()</code></a>, &hellip;</td> <td> <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>imodify()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/imap.html" target="_blank" rel="noopener"><code>iwalk()</code></a></td> </tr> <tr> <td>N arguments</td> <td> <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap()</code></a></td> <td> <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap_lgl()</code></a>, &hellip;</td> <td>&mdash;</td> <td> <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pwalk()</code></a></td> </tr> </tbody> </table> <h2 id="whats-up-with-walk">What&rsquo;s up with <code>walk()</code>? <a href="#whats-up-with-walk"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Based on the table above, you might think that <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a> isn&rsquo;t very useful. Indeed, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a>, <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>walk2()</code></a>, and <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pwalk()</code></a> all invisibly return <code>.x</code>. However, they come in handy when you want to call a function for its <em><strong>side effects</strong></em> rather than its return value.</p> <p>Here, we&rsquo;ll go through two common use cases: saving multiple CSVs, and multiple plots. We&rsquo;ll also make use of the <a href="https://fs.r-lib.org/" target="_blank" rel="noopener">fs</a> package, a cross-platform interface to file system operations, to inspect our outputs.</p> <p>If you want to try this out but don&rsquo;t want to save files locally, there&rsquo;s a <a href="https://posit.cloud/content/5983147" target="_blank" rel="noopener">companion project on <strong>Posit Cloud</strong></a> where you can follow along.</p> <div class="highlight"> <a class="test-drive-link" href="https://posit.cloud/content/5983147" target="_blank"> <button class="test-drive-btn"><i class="fa fa-cloud" aria-hidden="true"></i> Test Drive on Posit Cloud</button> </a> </div> <h2 id="writing-and-deleting-multiple-csvs">Writing (and deleting) multiple CSVs <a href="#writing-and-deleting-multiple-csvs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To get started, we&rsquo;ll need some data. Let&rsquo;s use the <a href="https://googlesheets4.tidyverse.org/reference/gs4_examples.html" target="_blank" rel="noopener">gapminder</a> example Sheet built into <a href="https://googlesheets4.tidyverse.org/" target="_blank" rel="noopener">googlesheets4</a>. Because there are multiple worksheets (one for each continent), we&rsquo;ll use <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> to apply <a href="https://googlesheets4.tidyverse.org/reference/range_read.html" target="_blank" rel="noopener"><code>read_sheet()</code></a><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> to each one, and get back a list of data frames.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://googlesheets4.tidyverse.org'>googlesheets4</a></span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>ss</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://googlesheets4.tidyverse.org/reference/gs4_examples.html'>gs4_example</a></span><span class='o'>(</span><span class='s'>"gapminder"</span><span class='o'>)</span> <span class='c'># get sheet id</span></span> <span><span class='nv'>sheets</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://googlesheets4.tidyverse.org/reference/sheet_properties.html'>sheet_names</a></span><span class='o'>(</span><span class='nv'>ss</span><span class='o'>)</span> <span class='c'># get the names of individual sheets</span></span> <span><span class='nv'>gap_dfs</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='nv'>sheets</span>, .f <span class='o'>=</span> \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nf'><a href='https://googlesheets4.tidyverse.org/reference/range_read.html'>read_sheet</a></span><span class='o'>(</span><span class='nv'>ss</span>, sheet <span class='o'>=</span> <span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Reading from <span style='color: #00BBBB;'>gapminder</span>.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Range '<span style='color: #BBBB00;'>'Africa'</span>'.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Reading from <span style='color: #00BBBB;'>gapminder</span>.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Range '<span style='color: #BBBB00;'>'Americas'</span>'.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Reading from <span style='color: #00BBBB;'>gapminder</span>.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Range '<span style='color: #BBBB00;'>'Asia'</span>'.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Reading from <span style='color: #00BBBB;'>gapminder</span>.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Range '<span style='color: #BBBB00;'>'Europe'</span>'.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Reading from <span style='color: #00BBBB;'>gapminder</span>.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> Range '<span style='color: #BBBB00;'>'Oceania'</span>'.</span></span> <span></span></code></pre> </div> <p>Note that the backslash syntax for anonymous functions (e.g. <code>\(x) x + 1</code>) was introduced in base R version 4.1.0 as a shorthand for <code>function(x) x + 1</code>. If you&rsquo;re using an earlier version of R, you can use purrr&rsquo;s shorthand: a formula (e.g. <code>~ .x + 1</code>).</p> <p>Typically, you&rsquo;d want to combine these data frames into one to make it easier to work with your data. To do so, we&rsquo;ll use <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> on <code>gap_dfs</code>. I&rsquo;ve kept the intermediary object, since we&rsquo;ll use it in a moment with <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a>, but could have just as easily piped the output directly. The combination of <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>purrr::map()</code></a> and <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> is a handy one that you can learn more about in the <a href="https://r4ds.hadley.nz/iteration.html?#purrrmap-and-list_rbind" target="_blank" rel="noopener">R for Data Science</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>gap_combined</span> <span class='o'>&lt;-</span> <span class='nv'>gap_dfs</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_c.html'>list_rbind</a></span><span class='o'>(</span><span class='o'>)</span></span></code></pre> </div> <p>Now let&rsquo;s say that, for whatever reason, you&rsquo;d like to save the data from these sheets as individual CSVs. This is where <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a> comes into play&mdash;writing out the file with <a href="https://readr.tidyverse.org/reference/write_delim.html" target="_blank" rel="noopener"><code>write_csv()</code></a> is a &ldquo;side effect.&rdquo; We&rsquo;ll use <a href="https://fs.r-lib.org/reference/create.html" target="_blank" rel="noopener"><code>fs::dir_create()</code></a> to create a data folder to put our files into<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>, and build a vector of paths/file names. Since we have two arguments, the list of data frames, and the paths, we&rsquo;ll use <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>walk2()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/create.html'>dir_create</a></span><span class='o'>(</span><span class='s'>"data"</span><span class='o'>)</span></span> <span><span class='nv'>paths</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://stringr.tidyverse.org/reference/str_glue.html'>str_glue</a></span><span class='o'>(</span><span class='s'>"data/gapminder_&#123;tolower(sheets)&#125;.csv"</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>walk2</a></span><span class='o'>(</span></span> <span> <span class='nv'>gap_dfs</span>, </span> <span> <span class='nv'>paths</span>,</span> <span> \<span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>name</span><span class='o'>)</span> <span class='nf'><a href='https://readr.tidyverse.org/reference/write_delim.html'>write_csv</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>name</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>To see what we&rsquo;ve done, we can use <a href="https://fs.r-lib.org/reference/dir_tree.html" target="_blank" rel="noopener"><code>fs::dir_tree()</code></a> to see the contents of the directory as a tree, or <a href="https://fs.r-lib.org/reference/dir_ls.html" target="_blank" rel="noopener"><code>fs::dir_ls()</code></a> to return the paths as a vector. These functions also take <code>glob</code> and <code>regexp</code> arguments, allowing you to filter paths by file type with globbing patterns (e.g. <code>*.csv</code>) or using a regular expression passed on to <a href="https://rdrr.io/r/base/grep.html" target="_blank" rel="noopener"><code>grep()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/dir_tree.html'>dir_tree</a></span><span class='o'>(</span><span class='s'>"data"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB; font-weight: bold;'>data</span></span></span> <span><span class='c'>#&gt; ├── gapminder_africa.csv</span></span> <span><span class='c'>#&gt; ├── gapminder_americas.csv</span></span> <span><span class='c'>#&gt; ├── gapminder_asia.csv</span></span> <span><span class='c'>#&gt; ├── gapminder_europe.csv</span></span> <span><span class='c'>#&gt; └── gapminder_oceania.csv</span></span> <span></span><span><span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/dir_ls.html'>dir_ls</a></span><span class='o'>(</span><span class='s'>"data"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; data/gapminder_africa.csv data/gapminder_americas.csv </span></span> <span><span class='c'>#&gt; data/gapminder_asia.csv data/gapminder_europe.csv </span></span> <span><span class='c'>#&gt; data/gapminder_oceania.csv</span></span> <span></span></code></pre> </div> <p>If you&rsquo;re having regrets, or want to return your example project to its previous state, it&rsquo;s just as easy to <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a> <a href="https://fs.r-lib.org/reference/delete.html" target="_blank" rel="noopener"><code>fs::file_delete()</code></a> along those same paths.<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>walk</a></span><span class='o'>(</span><span class='nv'>paths</span>, \<span class='o'>(</span><span class='nv'>paths</span><span class='o'>)</span> <span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/delete.html'>file_delete</a></span><span class='o'>(</span><span class='nv'>paths</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <h2 id="saving-multiple-plots">Saving multiple plots <a href="#saving-multiple-plots"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Now, let&rsquo;s say you want to create and save a bunch of plots. We&rsquo;ll use a modified version of the <a href="https://r4ds.hadley.nz/functions.html#combining-with-other-tidyverse" target="_blank" rel="noopener"><code>conditional_bars()</code></a><sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> function from the R for Data Science chapter on writing <a href="https://r4ds.hadley.nz/functions.html" target="_blank" rel="noopener">functions</a>, and the built-in <a href="https://ggplot2.tidyverse.org/reference/diamonds.html" target="_blank" rel="noopener">diamonds</a> dataset.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># modified conditional bars function from R4DS</span></span> <span><span class='nv'>conditional_bars</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>condition</span>, <span class='nv'>var</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='o'>&#123;</span><span class='o'>&#123;</span> <span class='nv'>condition</span> <span class='o'>&#125;</span><span class='o'>&#125;</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='o'>&#123;</span><span class='o'>&#123;</span> <span class='nv'>var</span> <span class='o'>&#125;</span><span class='o'>&#125;</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='nf'>rlang</span><span class='nf'>::</span><span class='nf'><a href='https://rlang.r-lib.org/reference/englue.html'>englue</a></span><span class='o'>(</span><span class='s'>"Count of diamonds by &#123;&#123;var&#125;&#125; where &#123;&#123;condition&#125;&#125;"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <p>It&rsquo;s easy enough to run this for one condition, for example for the diamonds with <code>cut == &quot;Good&quot;</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>diamonds</span> <span class='o'>|&gt;</span> <span class='nf'>conditional_bars</span><span class='o'>(</span><span class='nv'>cut</span> <span class='o'>==</span> <span class='s'>"Good"</span>, <span class='nv'>clarity</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/goodclarity-1.png" alt="Bar chart showing count of diamonds by clarity in the diamonds dataset where the cut == Good." width="700px" style="display: block; margin: auto;" /></p> </div> <p>But what if we want to make and save a plot for each cut? Again, it&rsquo;s <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> and <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a> to the rescue.</p> <p>Because we&rsquo;re using the same data (<code>diamonds</code>) and conditioning on the same variable (<code>cut</code>), we&rsquo;ll only need to <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> across the levels of <code>cut</code>, and can hard code the rest into the anonymous function.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># get the levels</span></span> <span><span class='nv'>cuts</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/levels.html'>levels</a></span><span class='o'>(</span><span class='nv'>diamonds</span><span class='o'>$</span><span class='nv'>cut</span><span class='o'>)</span></span> <span></span> <span><span class='c'># make the plots</span></span> <span><span class='nv'>plots</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span></span> <span> <span class='nv'>cuts</span>,</span> <span> \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nf'>conditional_bars</span><span class='o'>(</span></span> <span> df <span class='o'>=</span> <span class='nv'>diamonds</span>,</span> <span> <span class='nv'>cut</span> <span class='o'>==</span> <span class='o'>&#123;</span><span class='o'>&#123;</span> <span class='nv'>x</span> <span class='o'>&#125;</span><span class='o'>&#125;</span>,</span> <span> <span class='nv'>clarity</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p>The plots are now saved in a list&mdash;a fine format for storing ggplots. As we did when saving our CSVs, we&rsquo;ll use fs to create a directory to store them in, and make a vector of paths for file names.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># make the folder to put them it (if exists, &#123;fs&#125; does nothing)</span></span> <span><span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/create.html'>dir_create</a></span><span class='o'>(</span><span class='s'>"plots"</span><span class='o'>)</span></span> <span><span class='c'># make the file names</span></span> <span><span class='nv'>plot_paths</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://stringr.tidyverse.org/reference/str_glue.html'>str_glue</a></span><span class='o'>(</span><span class='s'>"plots/&#123;tolower(cuts)&#125;_clarity.png"</span><span class='o'>)</span></span></code></pre> </div> <p>Now we can use the paths and plots with <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>walk2()</code></a> to pass them as arguments to <a href="https://ggplot2.tidyverse.org/reference/ggsave.html" target="_blank" rel="noopener"><code>ggsave()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>walk2</a></span><span class='o'>(</span></span> <span> <span class='nv'>plot_paths</span>,</span> <span> <span class='nv'>plots</span>,</span> <span> \<span class='o'>(</span><span class='nv'>path</span>, <span class='nv'>plot</span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggsave.html'>ggsave</a></span><span class='o'>(</span><span class='nv'>path</span>, <span class='nv'>plot</span>, width <span class='o'>=</span> <span class='m'>6</span>, height <span class='o'>=</span> <span class='m'>6</span><span class='o'>)</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p>Again, we can use fs to see what we&rsquo;ve done:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/dir_tree.html'>dir_tree</a></span><span class='o'>(</span><span class='s'>"plots"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB; font-weight: bold;'>plots</span></span></span> <span><span class='c'>#&gt; ├── <span style='color: #BB00BB; font-weight: bold;'>fair_clarity.png</span></span></span> <span><span class='c'>#&gt; ├── <span style='color: #BB00BB; font-weight: bold;'>good_clarity.png</span></span></span> <span><span class='c'>#&gt; ├── <span style='color: #BB00BB; font-weight: bold;'>ideal_clarity.png</span></span></span> <span><span class='c'>#&gt; ├── <span style='color: #BB00BB; font-weight: bold;'>premium_clarity.png</span></span></span> <span><span class='c'>#&gt; └── <span style='color: #BB00BB; font-weight: bold;'>very good_clarity.png</span></span></span> <span></span></code></pre> </div> <p>And, clean up after ourselves if we didn&rsquo;t <em>really</em> want those plots after all.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>walk</a></span><span class='o'>(</span><span class='nv'>plot_paths</span>, \<span class='o'>(</span><span class='nv'>paths</span><span class='o'>)</span> <span class='nf'>fs</span><span class='nf'>::</span><span class='nf'><a href='https://fs.r-lib.org/reference/delete.html'>file_delete</a></span><span class='o'>(</span><span class='nv'>paths</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <h2 id="fin">Fin <a href="#fin"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Hopefully this gave you a taste for some of what <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>walk()</code></a> can do. To learn more, see <a href="https://r4ds.hadley.nz/iteration.html#saving-multiple-outputs" target="_blank" rel="noopener">Saving multiple outputs</a> in the Iteration chapter of R for Data Science.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>See <a href="https://googlesheets4.tidyverse.org/articles/googlesheets4.html" target="_blank" rel="noopener">Getting started with googlesheets4</a> to learn more about the basics of reading and writing sheets. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>If the directory already exists, it will be left unchanged. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>There&rsquo;s also a function in fs called <a href="https://fs.r-lib.org/reference/dir_ls.html" target="_blank" rel="noopener"><code>dir_walk()</code></a>, which you can feel free to explore on your own. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>I&rsquo;ve added a title that reflects the variable name and condition with <a href="https://rlang.r-lib.org/reference/englue.html" target="_blank" rel="noopener"><code>rlang::englue()</code></a>, which you can learn more about in the <a href="https://r4ds.hadley.nz/functions.html#labeling" target="_blank" rel="noopener">Labeling</a> section of the same R4DS chapter. <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> desirability2 https://www.tidyverse.org/blog/2023/05/desirability2/ Wed, 17 May 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/05/desirability2/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>We&rsquo;re tickled pink to announce the release of <a href="http://desirability2.tidymodels.org" target="_blank" rel="noopener">desirability2</a> (version 0.0.1). You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;desirability2&#34;</span><span class="p">)</span> </code></pre></div><p>This blog post will introduce you to the package and desirability functions.</p> <p>Let&rsquo;s load some packages!</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">desirability2</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span> </code></pre></div><p> <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C7&amp;q=%22desirability&#43;functions%22" target="_blank" rel="noopener">Desirability functions</a> are tools that can be used to rank or optimize multiple characteristics at once. They are intuitive and easy to use. There are a few R packages that implement them, including <a href="http://cran.r-project.org/package=desirability" target="_blank" rel="noopener">desirability</a> and <a href="http://cran.r-project.org/package=desiR" target="_blank" rel="noopener">desiR</a>.</p> <p>We have a new one, <a href="http://cran.r-project.org/package=desirability2" target="_blank" rel="noopener">desirability2</a>, with an interface conducive to being used in-line via dplyr pipelines.</p> <p>Let&rsquo;s demonstrate that by looking at an application. Suppose we created a classification model and produced multiple metrics on how well it classifies new data. We measured the area under the ROC curve and the binomial log-loss statistic in this example. There are about 300 different model configurations that we investigated via tuning.</p> <p>The results from the tuning process were:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> </code></pre></div><pre><code>## # A tibble: 298 × 5 ## mixture penalty mn_log_loss roc_auc num_features ## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; ## 1 0 0.1 0.199 0.869 211 ## 2 0 0.0788 0.196 0.870 211 ## 3 0 0.0621 0.194 0.871 211 ## 4 0 0.0489 0.192 0.872 211 ## 5 0 0.0386 0.191 0.873 211 ## 6 0 0.0304 0.190 0.873 211 ## 7 0 0.0240 0.188 0.874 211 ## 8 0 0.0189 0.188 0.874 211 ## 9 0 0.0149 0.187 0.874 211 ## 10 0 0.0117 0.186 0.874 211 ## # ℹ 288 more rows </code></pre><p>If we were interested in the best area under the ROC curve:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">|&gt;</span> <span class="nf">slice_max</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="m">1</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 1 × 5 ## mixture penalty mn_log_loss roc_auc num_features ## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; ## 1 0.222 0.00574 0.185 0.876 86 </code></pre><p>However, there are different optimal settings when the log-likelihood is considered:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">|&gt;</span> <span class="nf">slice_min</span><span class="p">(</span><span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="m">1</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 1 × 5 ## mixture penalty mn_log_loss roc_auc num_features ## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; ## 1 1 0.000853 0.184 0.876 103 </code></pre><p>Are the two metrics related? Here&rsquo;s a plot of the data:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">|&gt;</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="n">num_features</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="m">1</span><span class="o">/</span><span class="m">2</span><span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-7-1.svg" alt="plot of chunk unnamed-chunk-7" width="60%" /></p> <p>We colored the point using the number of features used in the model. Fewer predictors are better; we&rsquo;d like to factor that into the tuning parameter selection.</p> <p>To optimize them all at once, desirability functions map their values to be between zero and one (with the latter being the most desirable). For the ROC scores, a value of 1.0 is best, and we may not consider a model with an AUC of less than 0.80. We can use desirability2&rsquo;s <a href="http://desirability2.tidymodels.org/reference/inline_desirability.html" target="_blank" rel="noopener"><code>d_max()</code></a> function to translate these values to desirability:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">roc_d</span> <span class="o">=</span> <span class="nf">d_max</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.8</span><span class="p">))</span> <span class="o">%&gt;%</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">roc_d</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">()</span> <span class="o">+</span> <span class="nf">lims</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">1</span><span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-8-1.svg" alt="plot of chunk unnamed-chunk-8" width="60%" /></p> <p>Note that all model configurations with ROC AUC scores below 0.80 have zero desirability.</p> <p>Since we want to reduce loss, we can use <code>d_min()</code> to show a curve where smaller is better. For this specification, we&rsquo;ll use the min and max values as defined by the data, by setting <code>use_data = TRUE</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span> <span class="n">roc_d</span> <span class="o">=</span> <span class="nf">d_max</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.8</span><span class="p">),</span> <span class="n">loss_d</span> <span class="o">=</span> <span class="nf">d_min</span><span class="p">(</span><span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">use_data</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">loss_d</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">()</span> <span class="o">+</span> <span class="nf">lims</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">1</span><span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-9-1.svg" alt="plot of chunk unnamed-chunk-9" width="60%" /></p> <p>Finally, we can factor in the number of features. Arguably this is more important to use than the other two outcomes; we will make this curve nonlinear so that it becomes more challenging to be desirable as the number of features increases. For this, we&rsquo;ll use the <code>scale</code> option to <code>d_min()</code>, where larger values make the criteria more difficult to satisfy:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span> <span class="n">roc_d</span> <span class="o">=</span> <span class="nf">d_max</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.8</span><span class="p">),</span> <span class="n">loss_d</span> <span class="o">=</span> <span class="nf">d_min</span><span class="p">(</span><span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">use_data</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> <span class="n">feat_d</span> <span class="o">=</span> <span class="nf">d_min</span><span class="p">(</span><span class="n">num_features</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">100</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="m">2</span><span class="p">)</span> <span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">num_features</span><span class="p">,</span> <span class="n">feat_d</span><span class="p">))</span> <span class="o">+</span> <span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">()</span> <span class="o">+</span> <span class="nf">lims</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0</span><span class="o">:</span><span class="m">1</span><span class="p">)</span> </code></pre></div><p><img src="figure/unnamed-chunk-10-1.svg" alt="plot of chunk unnamed-chunk-10" width="60%" /></p> <p>Combining these components into a single criterion using the geometric mean is common. Using this statistic has the side effect that any criteria with zero desirability make the overall desirability zero (since the geometric mean multiples the values). There is a function called <a href="http://desirability2.tidymodels.org/reference/d_overall.html" target="_blank" rel="noopener"><code>d_overall()</code></a> that can be used with dplyr&rsquo;s <code>across()</code> function. Sorting by overall desirability gives us tuning parameter values (<code>mixture</code> and <code>penalty</code>) that are best for this combination of criteria.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">classification_results</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span> <span class="n">roc_d</span> <span class="o">=</span> <span class="nf">d_max</span><span class="p">(</span><span class="n">roc_auc</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">1</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0.8</span><span class="p">),</span> <span class="n">loss_d</span> <span class="o">=</span> <span class="nf">d_min</span><span class="p">(</span><span class="n">mn_log_loss</span><span class="p">,</span> <span class="n">use_data</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">),</span> <span class="n">feat_d</span> <span class="o">=</span> <span class="nf">d_min</span><span class="p">(</span><span class="n">num_features</span><span class="p">,</span> <span class="n">low</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span> <span class="n">high</span> <span class="o">=</span> <span class="m">100</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="m">2</span><span class="p">),</span> <span class="n">overall</span> <span class="o">=</span> <span class="nf">d_overall</span><span class="p">(</span><span class="nf">across</span><span class="p">(</span><span class="nf">ends_with</span><span class="p">(</span><span class="s">&#34;_d&#34;</span><span class="p">)))</span> <span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">slice_max</span><span class="p">(</span><span class="n">overall</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="m">5</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 5 × 9 ## mixture penalty mn_log_loss roc_auc num_features roc_d loss_d feat_d overall ## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; ## 1 1 0.00924 0.200 0.859 15 0.295 0.815 0.722 0.558 ## 2 0.667 0.0117 0.199 0.862 18 0.311 0.827 0.672 0.557 ## 3 0.667 0.0149 0.201 0.858 14 0.291 0.802 0.740 0.557 ## 4 0.889 0.00924 0.199 0.861 18 0.305 0.825 0.672 0.553 ## 5 0.889 0.0117 0.201 0.857 14 0.285 0.801 0.740 0.553 </code></pre><p>That&rsquo;s it! That&rsquo;s the package.</p> Q1 2023 tidymodels digest https://www.tidyverse.org/blog/2023/04/tidymodels-2023-q1/ Fri, 28 Apr 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/04/tidymodels-2023-q1/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these posts from the past couple of months:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2023/04/tuning-delights/" target="_blank" rel="noopener">Tuning hyperparameters with tidymodels is a delight</a></li> <li> <a href="https://www.tidyverse.org/blog/2023/04/censored-0-2-0/" target="_blank" rel="noopener">censored 0.2.0</a></li> <li> <a href="https://www.simonpcouch.com/blog/speedups-2023/" target="_blank" rel="noopener">The tidymodels is getting a whole lot faster</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 24 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>agua <a href="https://agua.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.2)</a></li> <li>baguette <a href="https://baguette.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>broom <a href="https://broom.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.4)</a></li> <li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.2)</a></li> <li>censored <a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.0)</a></li> <li>dials <a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> <li>discrim <a href="https://discrim.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>embed <a href="https://embed.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>finetune <a href="https://finetune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>hardhat <a href="https://hardhat.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.3.0)</a></li> <li>modeldata <a href="https://modeldata.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.6)</a></li> <li>rules <a href="https://rules.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>spatialsample <a href="https://spatialsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a></li> <li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>textrecipes <a href="https://textrecipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a></li> <li>themis <a href="https://themis.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>tidyclust <a href="https://tidyclust.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.2)</a></li> <li>tidypredict <a href="https://tidypredict.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.5)</a></li> <li>tune <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.1)</a></li> <li>workflows <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.3)</a></li> <li>workflowsets <a href="https://workflowsets.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>yardstick <a href="https://yardstick.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.2.0)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: more informative errors and faster code. First, loading the collection of packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://embed.tidymodels.org'>embed</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"ames"</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="more-informative-errors">More informative errors <a href="#more-informative-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In the last few months we have been focused on refining error messages so that they are easier for the users to pinpoint what went wrong and where. Since the modeling pipeline can be quite complicated, getting uninformative errors is a no-go.</p> <p>Across the tidymodels, error messages will now indicate the user-facing function that caused the error rather than the internal function that it came from.</p> <p>From dials, an error that looked like</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">degree</span><span class="p">(</span><span class="n">range</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1L</span><span class="p">,</span> <span class="m">5L</span><span class="p">))</span> <span class="c1">#&gt; Error in `new_quant_param()`:</span> <span class="c1">#&gt; ! Since `type = &#39;double&#39;`, please use that data type for the range.</span> </code></pre></div><p>Now says that the error came from <code>degree()</code> rather than <code>new_quant_param()</code></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>degree</span><span class='o'>(</span>range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1L</span>, <span class='m'>5L</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `degree()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Since `type = 'double'`, please use that data type for the range.</span></span> <span></span></code></pre> </div> <p>The same thing can be seen with the yardstick metrics</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">mtcars</span> <span class="o">|&gt;</span> <span class="nf">accuracy</span><span class="p">(</span><span class="n">vs</span><span class="p">,</span> <span class="n">am</span><span class="p">)</span> <span class="c1">#&gt; Error in `dplyr::summarise()`:</span> <span class="c1">#&gt; ℹ In argument: `.estimate = metric_fn(truth = vs, estimate = am, na_rm =</span> <span class="c1">#&gt; na_rm)`.</span> <span class="c1">#&gt; Caused by error in `validate_class()`:</span> <span class="c1">#&gt; ! `truth` should be a factor but a numeric was supplied.</span> </code></pre></div><p>which now errors much more informatively</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>accuracy</span><span class='o'>(</span><span class='nv'>vs</span>, <span class='nv'>am</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `accuracy()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `truth` should be a factor, not a `numeric`.</span></span> <span></span></code></pre> </div> <p>Lastly, one of the biggest improvements came in recipes, which now shows which step caused the error instead of saying it happened in <a href="https://recipes.tidymodels.org/reference/prep.html" target="_blank" rel="noopener"><code>prep()</code></a> or <a href="https://recipes.tidymodels.org/reference/bake.html" target="_blank" rel="noopener"><code>bake()</code></a>. This is a huge improvement since preprocessing pipelines which often string together many preprocessing steps.</p> <p>Before</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">recipe</span><span class="p">(</span><span class="o">~</span><span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ames</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">step_novel</span><span class="p">(</span><span class="n">Neighborhood</span><span class="p">,</span> <span class="n">new_level</span> <span class="o">=</span> <span class="s">&#34;Gilbert&#34;</span><span class="p">)</span> <span class="o">|&gt;</span> <span class="nf">prep</span><span class="p">()</span> <span class="c1">#&gt; Error in `prep()`:</span> <span class="c1">#&gt; ! Columns already contain the new level: Neighborhood</span> </code></pre></div><p>Now</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_novel.html'>step_novel</a></span><span class='o'>(</span><span class='nv'>Neighborhood</span>, new_level <span class='o'>=</span> <span class='s'>"Gilbert"</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `step_novel()`:</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `prep()` at recipes/R/recipe.R:437:8:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Columns already contain the new level: Neighborhood</span></span> <span></span></code></pre> </div> <p>Especially when calls to recipes functions are deeply nested inside the call stack, like in <code>fit_resamples()</code> or <code>tune_grid()</code>, these changes make a big difference.</p> <h2 id="things-are-getting-faster">Things are getting faster <a href="#things-are-getting-faster"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As we have written about in <a href="https://www.simonpcouch.com/blog/speedups-2023/" target="_blank" rel="noopener">The tidymodels is getting a whole lot faster</a> and <a href="https://www.tidyverse.org/blog/2023/04/performant-packages/" target="_blank" rel="noopener">Writing performant code with tidy tools</a>, we have been working on tightening up the performance of the tidymodels code. These changes are mostly related to the infrastructure code, meaning that the speedup will bring you to closer underlying implementations.</p> <p>A different kind of speedup is found with the addition of the <a href="https://embed.tidymodels.org/reference/step_pca_truncated.html" target="_blank" rel="noopener">step_pca_truncated()</a> step added in the embed package.</p> <p> <a href="https://en.wikipedia.org/wiki/Principal_component_analysis" target="_blank" rel="noopener">Principal Component Analysis</a> is a really powerful and fast method for dimensionality reduction of large data sets. However, for data with many columns, it can be computationally expensive to calculate all the principal components. <a href="https://embed.tidymodels.org/reference/step_pca_truncated.html" target="_blank" rel="noopener"><code>step_pca_truncated()</code></a> works in much the same way as <a href="https://recipes.tidymodels.org/reference/step_pca.html" target="_blank" rel="noopener"><code>step_pca()</code></a> but it only calculates the number of components it needs</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>pca_normal</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_pca.html'>step_pca</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, num_comp <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>pca_truncated</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/step_dummy.html'>step_dummy</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_nominal_predictors</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://embed.tidymodels.org/reference/step_pca_truncated.html'>step_pca_truncated</a></span><span class='o'>(</span><span class='nf'><a href='https://recipes.tidymodels.org/reference/has_role.html'>all_numeric_predictors</a></span><span class='o'>(</span><span class='o'>)</span>, num_comp <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>tictoc</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/tictoc/man/tic.html'>tic</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='nv'>pca_normal</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>ames</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,930 × 4</span></span></span> <span><span class='c'>#&gt; Sale_Price PC1 PC2 PC3</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>215</span>000 -<span style='color: #BB0000; text-decoration: underline;'>31</span><span style='color: #BB0000;'>793.</span> <span style='text-decoration: underline;'>4</span>151. -<span style='color: #BB0000;'>197.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>105</span>000 -<span style='color: #BB0000; text-decoration: underline;'>12</span><span style='color: #BB0000;'>198.</span> -<span style='color: #BB0000;'>611.</span> -<span style='color: #BB0000;'>524.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>172</span>000 -<span style='color: #BB0000; text-decoration: underline;'>14</span><span style='color: #BB0000;'>911.</span> -<span style='color: #BB0000;'>265.</span> <span style='text-decoration: underline;'>7</span>568.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>244</span>000 -<span style='color: #BB0000; text-decoration: underline;'>12</span><span style='color: #BB0000;'>072.</span> -<span style='color: #BB0000; text-decoration: underline;'>1</span><span style='color: #BB0000;'>813.</span> 918.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>189</span>900 -<span style='color: #BB0000; text-decoration: underline;'>14</span><span style='color: #BB0000;'>418.</span> -<span style='color: #BB0000;'>345.</span> -<span style='color: #BB0000;'>302.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>195</span>500 -<span style='color: #BB0000; text-decoration: underline;'>10</span><span style='color: #BB0000;'>704.</span> -<span style='color: #BB0000; text-decoration: underline;'>1</span><span style='color: #BB0000;'>367.</span> -<span style='color: #BB0000;'>204.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>213</span>500 -<span style='color: #BB0000; text-decoration: underline;'>5</span><span style='color: #BB0000;'>858.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>805.</span> 114.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>191</span>500 -<span style='color: #BB0000; text-decoration: underline;'>5</span><span style='color: #BB0000;'>932.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>762.</span> 131.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>236</span>500 -<span style='color: #BB0000; text-decoration: underline;'>6</span><span style='color: #BB0000;'>368.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>862.</span> 325.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>189</span>000 -<span style='color: #BB0000; text-decoration: underline;'>8</span><span style='color: #BB0000;'>368.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>219.</span> 126.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2,920 more rows</span></span></span> <span></span><span><span class='nf'>tictoc</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/tictoc/man/tic.html'>toc</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; 0.782 sec elapsed</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>tictoc</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/tictoc/man/tic.html'>tic</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='nv'>pca_truncated</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span><span class='nv'>ames</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,930 × 4</span></span></span> <span><span class='c'>#&gt; Sale_Price PC1 PC2 PC3</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='text-decoration: underline;'>215</span>000 -<span style='color: #BB0000; text-decoration: underline;'>31</span><span style='color: #BB0000;'>793.</span> <span style='text-decoration: underline;'>4</span>151. -<span style='color: #BB0000;'>197.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='text-decoration: underline;'>105</span>000 -<span style='color: #BB0000; text-decoration: underline;'>12</span><span style='color: #BB0000;'>198.</span> -<span style='color: #BB0000;'>611.</span> -<span style='color: #BB0000;'>524.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='text-decoration: underline;'>172</span>000 -<span style='color: #BB0000; text-decoration: underline;'>14</span><span style='color: #BB0000;'>911.</span> -<span style='color: #BB0000;'>265.</span> <span style='text-decoration: underline;'>7</span>568.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='text-decoration: underline;'>244</span>000 -<span style='color: #BB0000; text-decoration: underline;'>12</span><span style='color: #BB0000;'>072.</span> -<span style='color: #BB0000; text-decoration: underline;'>1</span><span style='color: #BB0000;'>813.</span> 918.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='text-decoration: underline;'>189</span>900 -<span style='color: #BB0000; text-decoration: underline;'>14</span><span style='color: #BB0000;'>418.</span> -<span style='color: #BB0000;'>345.</span> -<span style='color: #BB0000;'>302.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='text-decoration: underline;'>195</span>500 -<span style='color: #BB0000; text-decoration: underline;'>10</span><span style='color: #BB0000;'>704.</span> -<span style='color: #BB0000; text-decoration: underline;'>1</span><span style='color: #BB0000;'>367.</span> -<span style='color: #BB0000;'>204.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='text-decoration: underline;'>213</span>500 -<span style='color: #BB0000; text-decoration: underline;'>5</span><span style='color: #BB0000;'>858.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>805.</span> 114.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='text-decoration: underline;'>191</span>500 -<span style='color: #BB0000; text-decoration: underline;'>5</span><span style='color: #BB0000;'>932.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>762.</span> 131.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='text-decoration: underline;'>236</span>500 -<span style='color: #BB0000; text-decoration: underline;'>6</span><span style='color: #BB0000;'>368.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>862.</span> 325.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='text-decoration: underline;'>189</span>000 -<span style='color: #BB0000; text-decoration: underline;'>8</span><span style='color: #BB0000;'>368.</span> -<span style='color: #BB0000; text-decoration: underline;'>2</span><span style='color: #BB0000;'>219.</span> 126.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 2,920 more rows</span></span></span> <span></span><span><span class='nf'>tictoc</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/tictoc/man/tic.html'>toc</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; 0.162 sec elapsed</span></span> <span></span></code></pre> </div> <p>The speedup will be orders of magnitude larger for very wide data.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>agua: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>baguette: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>broom: <a href="https://github.com/benwhalley" target="_blank" rel="noopener">@benwhalley</a>, <a href="https://github.com/dgrtwo" target="_blank" rel="noopener">@dgrtwo</a>, <a href="https://github.com/egosv" target="_blank" rel="noopener">@egosv</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/JorisChau" target="_blank" rel="noopener">@JorisChau</a>, <a href="https://github.com/mccarthy-m-g" target="_blank" rel="noopener">@mccarthy-m-g</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/paige-cho" target="_blank" rel="noopener">@paige-cho</a>, <a href="https://github.com/PoGibas" target="_blank" rel="noopener">@PoGibas</a>, <a href="https://github.com/rsbivand" target="_blank" rel="noopener">@rsbivand</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/ste-tuf" target="_blank" rel="noopener">@ste-tuf</a>, and <a href="https://github.com/victor-vscn" target="_blank" rel="noopener">@victor-vscn</a>.</li> <li>butcher: <a href="https://github.com/ashbythorpe" target="_blank" rel="noopener">@ashbythorpe</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/rkb965" target="_blank" rel="noopener">@rkb965</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>censored: <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, and <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>.</li> <li>dials: <a href="https://github.com/amin0511ss" target="_blank" rel="noopener">@amin0511ss</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>discrim: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/tomwagstaff-opml" target="_blank" rel="noopener">@tomwagstaff-opml</a>.</li> <li>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jackobenco016" target="_blank" rel="noopener">@jackobenco016</a>, and <a href="https://github.com/skasowitz" target="_blank" rel="noopener">@skasowitz</a>.</li> <li>finetune: <a href="https://github.com/Freestyleyang" target="_blank" rel="noopener">@Freestyleyang</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>hardhat: <a href="https://github.com/cregouby" target="_blank" rel="noopener">@cregouby</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/frank113" target="_blank" rel="noopener">@frank113</a>, and <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>.</li> <li>modeldata: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>parsnip: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/emmafeuer" target="_blank" rel="noopener">@emmafeuer</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mariamaseng" target="_blank" rel="noopener">@mariamaseng</a>, <a href="https://github.com/SHo-JANG" target="_blank" rel="noopener">@SHo-JANG</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/Tripartio" target="_blank" rel="noopener">@Tripartio</a>.</li> <li>recipes: <a href="https://github.com/AshesITR" target="_blank" rel="noopener">@AshesITR</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jjcurtin" target="_blank" rel="noopener">@jjcurtin</a>, <a href="https://github.com/lang-benjamin" target="_blank" rel="noopener">@lang-benjamin</a>, <a href="https://github.com/lbui30" target="_blank" rel="noopener">@lbui30</a>, <a href="https://github.com/PeterKoffeldt" target="_blank" rel="noopener">@PeterKoffeldt</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/StevenWallaert" target="_blank" rel="noopener">@StevenWallaert</a>, <a href="https://github.com/tellyshia" target="_blank" rel="noopener">@tellyshia</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/ttrodrigz" target="_blank" rel="noopener">@ttrodrigz</a>, and <a href="https://github.com/zecojls" target="_blank" rel="noopener">@zecojls</a>.</li> <li>rules: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>spatialsample: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/RaymondBalise" target="_blank" rel="noopener">@RaymondBalise</a>.</li> <li>stacks: <a href="https://github.com/amin0511ss" target="_blank" rel="noopener">@amin0511ss</a>, <a href="https://github.com/gundalav" target="_blank" rel="noopener">@gundalav</a>, <a href="https://github.com/jrosell" target="_blank" rel="noopener">@jrosell</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/pbulsink" target="_blank" rel="noopener">@pbulsink</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>textrecipes: <a href="https://github.com/apsteinmetz" target="_blank" rel="noopener">@apsteinmetz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gary-mu" target="_blank" rel="noopener">@gary-mu</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>.</li> <li>themis: <a href="https://github.com/carlganz" target="_blank" rel="noopener">@carlganz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/rmurphy49" target="_blank" rel="noopener">@rmurphy49</a>, and <a href="https://github.com/rowanjh" target="_blank" rel="noopener">@rowanjh</a>.</li> <li>tidyclust: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/hsbadr" target="_blank" rel="noopener">@hsbadr</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>tidypredict: <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, and <a href="https://github.com/sdcharle" target="_blank" rel="noopener">@sdcharle</a>.</li> <li>tune: <a href="https://github.com/BenoitLondon" target="_blank" rel="noopener">@BenoitLondon</a>, <a href="https://github.com/cphaarmeyer" target="_blank" rel="noopener">@cphaarmeyer</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jthomasmock" target="_blank" rel="noopener">@jthomasmock</a>, <a href="https://github.com/mrjujas" target="_blank" rel="noopener">@mrjujas</a>, <a href="https://github.com/MxNl" target="_blank" rel="noopener">@MxNl</a>, <a href="https://github.com/nabsiddiqui" target="_blank" rel="noopener">@nabsiddiqui</a>, <a href="https://github.com/rdavis120" target="_blank" rel="noopener">@rdavis120</a>, <a href="https://github.com/SHo-JANG" target="_blank" rel="noopener">@SHo-JANG</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>, and <a href="https://github.com/yusuftengriverdi" target="_blank" rel="noopener">@yusuftengriverdi</a>.</li> <li>workflows: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>workflowsets: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gsimchoni" target="_blank" rel="noopener">@gsimchoni</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>yardstick: <a href="https://github.com/77makr" target="_blank" rel="noopener">@77makr</a>, <a href="https://github.com/burch-cm" target="_blank" rel="noopener">@burch-cm</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/idavydov" target="_blank" rel="noopener">@idavydov</a>, <a href="https://github.com/kadyb" target="_blank" rel="noopener">@kadyb</a>, <a href="https://github.com/mawardivaz" target="_blank" rel="noopener">@mawardivaz</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/moloscripts" target="_blank" rel="noopener">@moloscripts</a>, <a href="https://github.com/nyambea" target="_blank" rel="noopener">@nyambea</a>, <a href="https://github.com/SHo-JANG" target="_blank" rel="noopener">@SHo-JANG</a>, <a href="https://github.com/simdadim" target="_blank" rel="noopener">@simdadim</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> </ul> </div> <p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors. Happy modeling!</p> Differences between the base R and magrittr pipes https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/ Fri, 21 Apr 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p><strong>Note:</strong> The following has been adapted from a section of the forthcoming second edition of <a href="https://r4ds.hadley.nz/" target="_blank" rel="noopener">R for Data Science</a> that had to be removed due to length limitations.</p> <h2 id="pipes">Pipes <a href="#pipes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>R 4.1.0 introduced a native pipe operator, <code>|&gt;</code>. As described in the <a href="https://cran.r-project.org/doc/manuals/r-devel/NEWS.html" target="_blank" rel="noopener">R News</a>:</p> <blockquote> <p>R now provides a simple native forward pipe syntax <code>|&gt;</code>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call. The pipe implementation as a syntax transformation was motivated by suggestions from Jim Hester and Lionel Henry.</p> </blockquote> <p>The behaviour of the native pipe is by and large the same as that of the <a href="https://magrittr.tidyverse.org/reference/pipe.html" target="_blank" rel="noopener"><code>%&gt;%</code></a> pipe provided by the <strong>magrittr</strong> package. Both operators (<code>|&gt;</code> and <code>%&gt;%</code>) let you &ldquo;pipe&rdquo; an object forward to a function or call expression, thereby allowing you to express a sequence of operations that transform an object.</p> <p>To learn more about the basic utility of pipes, see <a href="https://r4ds.hadley.nz/data-transform.html#the-pipe" target="_blank" rel="noopener">The pipe</a> section of R for Data Science.</p> <p>Luckily there&rsquo;s no need to commit entirely to one pipe or the other &mdash; you can use the base pipe for the majority of cases where it&rsquo;s sufficient and use the magrittr pipe when you really need its special features.</p> <h2 id="-vs"><code>|&gt;</code> vs. <code>%&gt;%</code> <a href="#-vs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While <code>|&gt;</code> and <code>%&gt;%</code> behave identically for simple cases, there are a few crucial differences. These are most likely to affect you if you&rsquo;re a long-term user of <code>%&gt;%</code> who has taken advantage of some of the more advanced features. But they&rsquo;re still good to know about even if you&rsquo;ve never used <code>%&gt;%</code> because you&rsquo;re likely to encounter some of them when reading wild-caught code.</p> <ul> <li> <p>By default, the pipe passes the object on its left-hand side to the first argument of the function on the right-hand side. <code>%&gt;%</code> allows you to change the placement with a <code>.</code> placeholder. For example, <code>x %&gt;% f(1)</code> is equivalent to <code>f(x, 1)</code> but <code>x %&gt;% f(1, .)</code> is equivalent to <code>f(1, x)</code>. R 4.2.0 added a <code>_</code> placeholder to the base pipe, with one additional restriction: the argument has to be named. For example, <code>x |&gt; f(1, y = _)</code> is equivalent to <code>f(1, y = x)</code>.</p> </li> <li> <p>The <code>|&gt;</code> placeholder is deliberately simple and can&rsquo;t replicate many features of the <code>%&gt;%</code> placeholder: you can&rsquo;t pass it to multiple arguments, and it doesn&rsquo;t have any special behavior when the placeholder is used inside another function. For example, <code>df %&gt;% split(.$var)</code> is equivalent to <code>split(df, df$var)</code>, and <code>df %&gt;% {split(.$x, .$y)}</code> is equivalent to <code>split(df$x, df$y)</code>.</p> <p>With <code>%&gt;%</code>, you can use <code>.</code> on the left-hand side of operators like <code>$</code>, <code>[[</code>, <code>[</code> , so you can extract a single column from a data frame with (e.g.) <code>mtcars %&gt;% .$cyl</code>. R added support for this feature in R 4.3.0. For the special case of extracting a column out of a data frame, you can also use <code>dplyr::pull()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'>pull</span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span></span></code></pre> </div> </li> <li> <p><code>%&gt;%</code> allows you to drop the parentheses when calling a function with no other arguments; <code>|&gt;</code> always requires the parentheses.</p> </li> <li> <p><code>%&gt;%</code> allows you to start a pipe with <code>.</code> to create a function rather than immediately executing the pipe; this is not supported by the base pipe.</p> </li> </ul> <h2 id="using-the-native-pipe-in-packages">Using the native pipe in packages <a href="#using-the-native-pipe-in-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Because the native pipe wasn&rsquo;t introduced until 4.1.0, code using <code>|&gt;</code> in function reference examples or vignettes will not work on older versions of R, as it is not valid syntax. This is a problem for the tidyverse because our <a href="https://www.tidyverse.org/blog/2019/04/r-version-support/" target="_blank" rel="noopener">versioning policies</a> mean that our packages need to work on R 3.5.0 and later.</p> <p>Does this mean that you need to increase the minimum R version your package depends on in order to use <code>|&gt;</code>? Not necessarily: there are two techniques we can use to keep vignettes and examples working.</p> <p>For example, the base pipe is used in purrr 1.0.0. As can be seen in the <a href="https://github.com/tidyverse/purrr/commit/df4630c6e8cd5028386ee96b9036f1755f26adc4" target="_blank" rel="noopener">source for the &ldquo;purrr &lt;-&gt; base R&rdquo; vignette</a>, certain code chunks are evaluated conditionally based on the version of R being used. The setup chunk for the vignette includes: <code>modern_r &lt;- getRversion() &gt;= &quot;4.1.0&quot;</code>. The results of this are then used in the <code>eval</code> argument to determine whether or not a code chunk that relies on &ldquo;modern R&rdquo; syntax should be run.</p> <p>The other place we use the base pipe is in examples. To disable these we use a bit of a hack that requires three files <a href="https://github.com/tidyverse/purrr/blob/main/configure" target="_blank" rel="noopener"><code>configure</code></a>, <a href="https://github.com/tidyverse/purrr/blob/main/cleanup" target="_blank" rel="noopener"><code>cleanup</code></a>, and <a href="https://github.com/tidyverse/purrr/blob/main/tools/examples.R" target="_blank" rel="noopener"><code>tools/examples.R</code></a>. The basic idea is for pre-R 4.1.0 we re-define the <code>\examples{}</code> tag to display an informative message but not run the code; this ensures that <code>R CMD check</code> continues to work even on older versions of R.</p> Tuning hyperparameters with tidymodels is a delight https://www.tidyverse.org/blog/2023/04/tuning-delights/ Thu, 20 Apr 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/04/tuning-delights/ <p>The tidymodels team recently released new versions of the tune, finetune, and workflowsets packages, and we&rsquo;re super stoked about it! Each of these three packages facilitates tuning hyperparameters in tidymodels, and their new releases work to make the experience of hyperparameter tuning more joyful.</p> <p>You can install these releases from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"tune"</span>, <span class='s'>"workflowsets"</span>, <span class='s'>"finetune"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will highlight some of new changes in these packages that we&rsquo;re most excited about.</p> <p>You can see the full lists of changes in the release notes for each package:</p> <ul> <li> <a href="https://github.com/tidymodels/tune/releases/tag/v1.1.0" target="_blank" rel="noopener">tune v1.1.0</a></li> <li> <a href="https://github.com/tidymodels/workflowsets/releases/tag/v1.0.1" target="_blank" rel="noopener">workflowsets v1.0.1</a></li> <li> <a href="https://github.com/tidymodels/finetune/releases/tag/v1.1.0" target="_blank" rel="noopener">finetune v1.1.0</a></li> </ul> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/finetune'>finetune</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="a-shorthand-for-fitting-the-optimal-model">A shorthand for fitting the optimal model <a href="#a-shorthand-for-fitting-the-optimal-model"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In tidymodels, the result of tuning a set of hyperparameters is a data structure describing the candidate models, their predictions, and the performance metrics associated with those predictions. For example, tuning the number of <code>neighbors</code> in a <code>nearest_neighbors()</code> model over a regular grid:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># tune the `neighbors` hyperparameter</span></span> <span><span class='nv'>knn_model_spec</span> <span class='o'>&lt;-</span> <span class='nf'>nearest_neighbor</span><span class='o'>(</span><span class='s'>"regression"</span>, neighbors <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>tuning_res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://tune.tidymodels.org/reference/tune_grid.html'>tune_grid</a></span><span class='o'>(</span></span> <span> <span class='nv'>knn_model_spec</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='m'>5</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/control_grid.html'>control_grid</a></span><span class='o'>(</span>save_workflow <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='c'># check out the resulting object</span></span> <span><span class='nv'>tuning_res</span></span> <span><span class='c'>#&gt; # Tuning results</span></span> <span><span class='c'>#&gt; # Bootstrap sampling </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 4</span></span></span> <span><span class='c'>#&gt; splits id .metrics .notes </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap1 <span style='color: #555555;'>&lt;tibble [20 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;split [32/12]&gt;</span> Bootstrap2 <span style='color: #555555;'>&lt;tibble [20 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap3 <span style='color: #555555;'>&lt;tibble [20 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;split [32/10]&gt;</span> Bootstrap4 <span style='color: #555555;'>&lt;tibble [20 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;split [32/12]&gt;</span> Bootstrap5 <span style='color: #555555;'>&lt;tibble [20 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># examine proposed hyperparameters and associated metrics</span></span> <span><span class='nf'><a href='https://tune.tidymodels.org/reference/collect_predictions.html'>collect_metrics</a></span><span class='o'>(</span><span class='nv'>tuning_res</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 7</span></span></span> <span><span class='c'>#&gt; neighbors .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 2 rmse standard 3.19 5 0.208 Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 2 rsq standard 0.664 5 0.086<span style='text-decoration: underline;'>1</span> Preprocessor1_Model01</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 3 rmse standard 3.13 5 0.266 Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 3 rsq standard 0.678 5 0.086<span style='text-decoration: underline;'>8</span> Preprocessor1_Model02</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 4 rmse standard 3.11 5 0.292 Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 4 rsq standard 0.684 5 0.085<span style='text-decoration: underline;'>1</span> Preprocessor1_Model03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 5 rmse standard 3.10 5 0.287 Preprocessor1_Model04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 5 rsq standard 0.686 5 0.083<span style='text-decoration: underline;'>9</span> Preprocessor1_Model04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 8 rmse standard 3.08 5 0.263 Preprocessor1_Model05</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 8 rsq standard 0.689 5 0.084<span style='text-decoration: underline;'>3</span> Preprocessor1_Model05</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 9 rmse standard 3.07 5 0.256 Preprocessor1_Model06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 9 rsq standard 0.691 5 0.084<span style='text-decoration: underline;'>5</span> Preprocessor1_Model06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> 10 rmse standard 3.06 5 0.247 Preprocessor1_Model07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> 10 rsq standard 0.693 5 0.083<span style='text-decoration: underline;'>7</span> Preprocessor1_Model07</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> 11 rmse standard 3.05 5 0.241 Preprocessor1_Model08</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> 11 rsq standard 0.696 5 0.083<span style='text-decoration: underline;'>3</span> Preprocessor1_Model08</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> 13 rmse standard 3.03 5 0.236 Preprocessor1_Model09</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> 13 rsq standard 0.701 5 0.082<span style='text-decoration: underline;'>0</span> Preprocessor1_Model09</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> 14 rmse standard 3.02 5 0.235 Preprocessor1_Model10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> 14 rsq standard 0.704 5 0.080<span style='text-decoration: underline;'>8</span> Preprocessor1_Model10</span></span> <span></span></code></pre> </div> <p>Given these tuning results, the next steps are to choose the &ldquo;best&rdquo; hyperparameters, assign those hyperparameters to the model, and fit the finalized model on the training set. Previously in tidymodels, this has felt like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># choose a method to define "best" and extract the resulting parameters</span></span> <span><span class='nv'>best_param</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/show_best.html'>select_best</a></span><span class='o'>(</span><span class='nv'>tuning_res</span>, <span class='s'>"rmse"</span><span class='o'>)</span> </span> <span></span> <span><span class='c'># assign those parameters to model</span></span> <span><span class='nv'>knn_model_final</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/finalize_model.html'>finalize_model</a></span><span class='o'>(</span><span class='nv'>knn_model_spec</span>, <span class='nv'>best_param</span><span class='o'>)</span></span> <span></span> <span><span class='c'># fit the finalized model to the training set</span></span> <span><span class='nv'>knn_fit</span> <span class='o'>&lt;-</span> <span class='nf'>fit</span><span class='o'>(</span><span class='nv'>knn_model_final</span>, <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>mtcars</span><span class='o'>)</span></span></code></pre> </div> <p>Voilà! <code>knn_fit</code> is a properly resampled model that is ready to <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> on new data:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>knn_fit</span>, <span class='nv'>mtcars</span><span class='o'>[</span><span class='m'>1</span>, <span class='o'>]</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span> <span><span class='c'>#&gt; .pred</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 22.0</span></span> <span></span></code></pre> </div> <p>The newest release of tune introduced a shorthand interface for going from tuning results to final fit called <a href="https://tune.tidymodels.org/reference/fit_best.html" target="_blank" rel="noopener"><code>fit_best()</code></a>. The function wraps each of those three functions with sensible defaults to abbreviate the process described above.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>knn_fit_2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/fit_best.html'>fit_best</a></span><span class='o'>(</span><span class='nv'>tuning_res</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>knn_fit_2</span>, <span class='nv'>mtcars</span><span class='o'>[</span><span class='m'>1</span>, <span class='o'>]</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 1</span></span></span> <span><span class='c'>#&gt; .pred</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 22.0</span></span> <span></span></code></pre> </div> <p>This function is closely related to the <a href="https://tune.tidymodels.org/reference/last_fit.html" target="_blank" rel="noopener"><code>last_fit()</code></a> function. They both give you access to a workflow fitted on the training data but are situated somewhat differently in the modeling workflow. <a href="https://tune.tidymodels.org/reference/fit_best.html" target="_blank" rel="noopener"><code>fit_best()</code></a> picks up after a tuning function like <a href="https://tune.tidymodels.org/reference/tune_grid.html" target="_blank" rel="noopener"><code>tune_grid()</code></a> to take you from tuning results to fitted workflow, ready for you to predict and assess further. <a href="https://tune.tidymodels.org/reference/last_fit.html" target="_blank" rel="noopener"><code>last_fit()</code></a> assumes you have made your choice of hyperparameters and finalized your workflow to then take you from finalized workflow to fitted workflow and further to performance assessment on the test data. While <a href="https://tune.tidymodels.org/reference/fit_best.html" target="_blank" rel="noopener"><code>fit_best()</code></a> gives a fitted workflow, <a href="https://tune.tidymodels.org/reference/last_fit.html" target="_blank" rel="noopener"><code>last_fit()</code></a> gives you the performance results. If you want the fitted workflow, you can extract it from the result of <a href="https://tune.tidymodels.org/reference/last_fit.html" target="_blank" rel="noopener"><code>last_fit()</code></a> via <a href="https://hardhat.tidymodels.org/reference/hardhat-extract.html" target="_blank" rel="noopener"><code>extract_workflow()</code></a>.</p> <p>The newest release of the workflowsets package also includes a <a href="https://tune.tidymodels.org/reference/fit_best.html" target="_blank" rel="noopener"><code>fit_best()</code></a> method for workflow set objects. Given a set of tuning results, that method will sift through all of the possible models to find and fit the optimal model configuration.</p> <h2 id="interactive-issue-logging">Interactive issue logging <a href="#interactive-issue-logging"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Imagine, in the previous example, we made some subtle error in specifying the tuning process. For example, passing a function to <code>extract</code> elements of the proposed workflows that injects some warnings and errors into the tuning process:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>raise_concerns</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='kr'><a href='https://rdrr.io/r/base/warning.html'>warning</a></span><span class='o'>(</span><span class='s'>"Ummm, wait. :o"</span><span class='o'>)</span></span> <span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"Eep! Nooo!"</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nv'>tuning_res</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'><a href='https://tune.tidymodels.org/reference/tune_grid.html'>tune_grid</a></span><span class='o'>(</span></span> <span> <span class='nv'>knn_model_spec</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='m'>5</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/control_grid.html'>control_grid</a></span><span class='o'>(</span>extract <span class='o'>=</span> <span class='nv'>raise_concerns</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span></code></pre> </div> <p>Warnings and errors can come up in all sorts of places while tuning hyperparameters. Often, with obvious issues, we can raise errors early on and halt the tuning process, but with more subtle concerns, we don&rsquo;t want to be too restrictive; it&rsquo;s sometimes better to defer to the underlying modeling packages to decide what&rsquo;s a dire issue versus something that can be worked around.</p> <p>In the past, we&rsquo;ve raised warnings and issues as they occur, printing context on the issue to the console before logging the issue in the tuning result. In the above example, this would look like:</p> <pre><code>! Bootstrap1: preprocessor 1/1, model 1/1 (extracts): Ummm, wait. :o x Bootstrap1: preprocessor 1/1, model 1/1 (extracts): Error in extractor(object): Eep! Nooo! ! Bootstrap2: preprocessor 1/1, model 1/1 (extracts): Ummm, wait. :o x Bootstrap2: preprocessor 1/1, model 1/1 (extracts): Error in extractor(object): Eep! Nooo! ! Bootstrap3: preprocessor 1/1, model 1/1 (extracts): Ummm, wait. :o x Bootstrap3: preprocessor 1/1, model 1/1 (extracts): Error in extractor(object): Eep! Nooo! ! Bootstrap4: preprocessor 1/1, model 1/1 (extracts): Ummm, wait. :o x Bootstrap4: preprocessor 1/1, model 1/1 (extracts): Error in extractor(object): Eep! Nooo! ! Bootstrap5: preprocessor 1/1, model 1/1 (extracts): Ummm, wait. :o x Bootstrap5: preprocessor 1/1, model 1/1 (extracts): Error in extractor(object): Eep! Nooo! </code></pre> <p>The above messages are super descriptive about where issues occur&mdash;they note in which resample, from which proposed modeling workflow, and in which part of the fitting process the issues occurred in. At the same time, they are quite repetitive; if there&rsquo;s an issue during hyperparameter tuning, it probably occurs in every resample, always in the same place. If, instead, we were evaluating this model against 1,000 resamples, or there were more than just two issues, this output could get very overwhelming very quickly.</p> <p>The new releases of our tuning packages include tools to determine which tuning issues are unique, and for each unique issue, only print out the message once while maintaining a dynamic count of how many times the issue occurred. With the new tune release, the same output would look like:</p> <div class="highlight"> <pre class='chroma'><span><span class='c'>#&gt; → <span style='color: #BBBB00; font-weight: bold;'>A</span> | <span style='color: #BBBB00;'>warning</span>: Ummm, wait. :o</span></span> <span></span><span><span class='c'>#&gt; → <span style='color: #BB0000; font-weight: bold;'>B</span> | <span style='color: #BB0000;'>error</span>: Eep! Nooo!</span></span> <span><span class='c'>#&gt; There were issues with some computations <span style='color: #BBBB00; font-weight: bold;'>A</span>: x5 <span style='color: #BB0000; font-weight: bold;'>B</span>: x5</span></span> <span></span></code></pre> </div> <p>This interface is hopefully less overwhelming for users. When the messages attached to these issues aren&rsquo;t enough to debug the issue, the complete set of information about the issues lives inside of the tuning result object, and can be retrieved with <code>collect_notes(tuning_res)</code>. To turn off the interactive logging, set the <code>verbose</code> control option to <code>TRUE</code>.</p> <h2 id="speedups">Speedups <a href="#speedups"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Each of these three releases, as well as releases of core tidymodels packages they depend on like parsnip, recipes, and hardhat, include a plethora of changes meant to optimize computational performance. Especially for modeling practitioners who work with many resamples and/or small data sets, our modeling workflows will feel a whole lot snappier:</p> <p><img src="https://simonpcouch.com/blog/speedups-2023/index_files/figure-html/unnamed-chunk-10-1.png" alt="A ggplot2 line graph plotting relative change in time to evaluate model fits with the tidymodels packages. Fits on datasets with 100 training rows are 2 to 3 times faster, while fits on datasets with 100,000 or more rows take about the same amount of time as they used to."></p> <p>With 100-row training data sets, the time to resample models with tune and friends has been at least halved. These releases are the first iteration of a set of changes to reduce the evaluation time of tidymodels code, and users can expect further optimizations in coming releases! See <a href="https://www.simonpcouch.com/blog/speedups-2023/" target="_blank" rel="noopener">this post on my blog</a> for more information about those speedups.</p> <h2 id="bonus-points">Bonus points <a href="#bonus-points"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Although they&rsquo;re smaller in scope, we wanted to highlight two additional developments in tuning hyperparameters with tidymodels.</p> <h3 id="workflow-set-support-for-tidyclust">Workflow set support for tidyclust <a href="#workflow-set-support-for-tidyclust"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The recent tidymodels package <a href="github.com/tidymodels/tidyclust">tidyclust</a> introduced support for fitting and tuning clustering models in tidymodels. That package&rsquo;s function <a href="https://tidyclust.tidymodels.org/reference/tune_cluster.html" target="_blank" rel="noopener"><code>tune_cluster()</code></a> is now an option for tuning in <a href="https://workflowsets.tidymodels.org/reference/workflow_map.html" target="_blank" rel="noopener"><code>workflow_map()</code></a>, meaning that users can fit sets of clustering models and preprocessors using workflow sets. These changes further integrate the tidyclust package into tidymodels framework.</p> <h3 id="refined-retrieval-of-intermediate-results">Refined retrieval of intermediate results <a href="#refined-retrieval-of-intermediate-results"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The <code>.Last.tune.result</code> helper stores the most recent tuning result in the object <code>.Last.tune.result</code> as a fail-safe in cases of interrupted tuning, uncaught tuning errors, and simply forgetting to assign tuning results to an object.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># be a silly goose and forget to assign results</span></span> <span><span class='nf'><a href='https://tune.tidymodels.org/reference/tune_grid.html'>tune_grid</a></span><span class='o'>(</span></span> <span> <span class='nv'>knn_model_spec</span>,</span> <span> <span class='nv'>mpg</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='m'>5</span><span class='o'>)</span>,</span> <span> control <span class='o'>=</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/control_grid.html'>control_grid</a></span><span class='o'>(</span>save_workflow <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; # Tuning results</span></span> <span><span class='c'>#&gt; # Bootstrap sampling </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 4</span></span></span> <span><span class='c'>#&gt; splits id .metrics .notes </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap1 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;split [32/14]&gt;</span> Bootstrap2 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;split [32/13]&gt;</span> Bootstrap3 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;split [32/12]&gt;</span> Bootstrap4 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap5 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># all is not lost!</span></span> <span><span class='nv'>.Last.tune.result</span></span> <span><span class='c'>#&gt; # Tuning results</span></span> <span><span class='c'>#&gt; # Bootstrap sampling </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 4</span></span></span> <span><span class='c'>#&gt; splits id .metrics .notes </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap1 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;split [32/14]&gt;</span> Bootstrap2 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;split [32/13]&gt;</span> Bootstrap3 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;split [32/12]&gt;</span> Bootstrap4 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;split [32/11]&gt;</span> Bootstrap5 <span style='color: #555555;'>&lt;tibble [18 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># assign to object after the fact</span></span> <span><span class='nv'>res</span> <span class='o'>&lt;-</span> <span class='nv'>.Last.tune.result</span></span></code></pre> </div> <p>These three releases introduce support for the <code>.Last.tune.result</code> object in more settings and refine support in existing implementations.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>, <a href="https://github.com/Freestyleyang" target="_blank" rel="noopener">@Freestyleyang</a>, and <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a> for their contributions to these packages since their last releases.</p> <p>Happy modeling, y&rsquo;all!</p> censored 0.2.0 https://www.tidyverse.org/blog/2023/04/censored-0-2-0/ Wed, 19 Apr 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/04/censored-0-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://censored.tidymodels.org/" target="_blank" rel="noopener">censored</a> 0.2.0. censored is a parsnip extension package for survival models.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"censored"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will introduce you to a new argument name, <code>eval_time</code>, and two new engines for fitting random forests and parametric survival models.</p> <p>You can see a full list of changes in the <a href="https://github.com/tidymodels/censored/releases/tag/v0.2.0" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="introducing-eval_time">Introducing <code>eval_time</code> <a href="#introducing-eval_time"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As we continue to add support for survival analysis across tidymodels, we have seen a need to be more explicit about which time we mean when we say &ldquo;time&rdquo;: event time, observed time, censoring time, time at which to predict survival probability at? The last one is a particular mouthful. We now refer to this time as &ldquo;evaluation time.&rdquo; In preparation for dynamic survival performance metrics which can be calculated at different evaluation time points, the argument to set these evaluation time points for <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> is now called <code>eval_time</code> instead of just <code>time</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cox</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/proportional_hazards.html'>proportional_hazards</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"survival"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span> <span><span class='nv'>pred</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox</span>, <span class='nv'>lung</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, <span class='o'>]</span>, type <span class='o'>=</span> <span class='s'>"survival"</span>, eval_time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='m'>500</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>pred</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span></span> <span><span class='c'>#&gt; .pred </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;tibble [2 × 2]&gt;</span></span></span> <span></span></code></pre> </div> <p>The predictions follow the tidymodels principle of one row per observation, and the nested tibble contains the predicted survival probability, <code>.pred_survival</code>, as well as the corresponding evaluation time. The column for the evaluation time is now called <code>.eval_time</code> instead of <code>.time</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>pred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; .eval_time .pred_survival</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 100 0.910</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 500 0.422</span></span> <span></span></code></pre> </div> <h2 id="new-engines">New engines <a href="#new-engines"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>censored contains engines for parametric, semi-parametric, and tree-based models. This release adds two new engines:</p> <ul> <li>the <code>&quot;aorsf&quot;</code> engine for random forests via <a href="https://parsnip.tidymodels.org/reference/rand_forest.html" target="_blank" rel="noopener"><code>rand_forest()</code></a></li> <li>the <code>&quot;flexsurvspline&quot;</code> engine for parametric models via <a href="https://parsnip.tidymodels.org/reference/survival_reg.html" target="_blank" rel="noopener"><code>survival_reg()</code></a></li> </ul> <h3 id="new-aorsf-engine-for-rand_forest">New <code>&quot;aorsf&quot;</code> engine for <code>rand_forest()</code> <a href="#new-aorsf-engine-for-rand_forest"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>This engine has been contributed by <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">Byron Jaeger</a> and enables users to fit oblique random survival forests with the aorsf package. What&rsquo;s with the <em>oblique</em> you ask?</p> <p>Oblique describes how the decision trees that form the random forest make their splits at each node. If the split is based on a single predictor, the resulting tree is called <em>axis-based</em> because the split is perpendicular to the axis of the predictor. If the split is based on a linear combination of predictors, there is a lot more flexibility in how the data is split: the split does not need to be perpendicular to any of the predictor axes. Such trees are called <em>oblique</em>.</p> <p>The documentation for the <a href="https://docs.ropensci.org/aorsf" target="_blank" rel="noopener">aorsf</a> package includes a nice illustration of this with the splits for an axis-based tree on the left and an oblique tree on the right:</p> <p><img src="https://docs.ropensci.org/aorsf/reference/figures/tree_axis_v_oblique.png" alt="Two scatter plots of data with two predictors, X1 and X2, and two classes, coded as pink dots and orange squares. The lefthand plot shows the splits of an axis-based decision tree which are at a right angle to the axis. The resulting partition generally separates the classes well but not perfectly. The righthand plot shows the splits of an oblique tree which achieves perfect separation on this example because it can cut across the predictor space diagnonally."></p> <p>To fit such a model, set the engine for a random forest to <code>&quot;aorsf&quot;</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lung</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/na.fail.html'>na.omit</a></span><span class='o'>(</span><span class='nv'>lung</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>forest</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"aorsf"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>time</span>, <span class='nv'>status</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>lung</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>pred</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>forest</span>, <span class='nv'>lung</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, <span class='o'>]</span>, type <span class='o'>=</span> <span class='s'>"survival"</span>, eval_time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='m'>500</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>pred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; .eval_time .pred_survival</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 100 0.928</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 500 0.368</span></span> <span></span></code></pre> </div> <h3 id="new-flexsurvspline-engine-for-survival_reg">New <code>&quot;flexsurvspline&quot;</code> engine for <code>survival_reg()</code> <a href="#new-flexsurvspline-engine-for-survival_reg"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>This engine has been contributed by <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">Matt Warkentin</a> and enables users to fit a parametric survival model with splines via <a href="https://rdrr.io/pkg/flexsurv/man/flexsurvspline.html" target="_blank" rel="noopener"><code>flexsurv::flexsurvspline()</code></a>.</p> <p>This model uses natural cubic splines to model a transformation of the survival function, e.g., the log cumulative hazard. This gives a lot more flexibility to a parametric model allowing us, for example, to represent more irregular hazard curves. Let&rsquo;s illustrate that with a data set of survival times of breast cancer patients, based on the example from <a href="https://www.jstatsoft.org/article/view/v070i08" target="_blank" rel="noopener">Jackson (2016)</a>.</p> <p>The flexibility of the model is governed by <code>k</code>, the number of knots in the spline. We set <code>scale = &quot;odds&quot;</code> for a proportional hazards model.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>bc</span>, package <span class='o'>=</span> <span class='s'>"flexsurv"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>fit_splines</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/survival_reg.html'>survival_reg</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"flexsurvspline"</span>, k <span class='o'>=</span> <span class='m'>5</span>, scale <span class='o'>=</span> <span class='s'>"odds"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>recyrs</span>, <span class='nv'>censrec</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>group</span>, data <span class='o'>=</span> <span class='nv'>bc</span><span class='o'>)</span></span></code></pre> </div> <p>For comparison, we also fit a parametric model without splines.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>fit_gengamma</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/survival_reg.html'>survival_reg</a></span><span class='o'>(</span>dist <span class='o'>=</span> <span class='s'>"gengamma"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"flexsurv"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>recyrs</span>, <span class='nv'>censrec</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>group</span>, data <span class='o'>=</span> <span class='nv'>bc</span><span class='o'>)</span></span></code></pre> </div> <p>We can predict the hazard for the three levels of the prognostic <code>group</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>bc_groups</span> <span class='o'>&lt;-</span> <span class='nf'>tibble</span><span class='o'>(</span>group <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Poor"</span>,<span class='s'>"Medium"</span>,<span class='s'>"Good"</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>pred_splines</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>fit_splines</span>, new_data <span class='o'>=</span> <span class='nv'>bc_groups</span>, type <span class='o'>=</span> <span class='s'>"hazard"</span>, </span> <span> eval_time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0.1</span>, <span class='m'>8</span>, by <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>mutate</span><span class='o'>(</span>model <span class='o'>=</span> <span class='s'>"splines"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>bind_cols</span><span class='o'>(</span><span class='nv'>bc_groups</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>pred_gengamma</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>fit_gengamma</span>, new_data <span class='o'>=</span> <span class='nv'>bc_groups</span>, type <span class='o'>=</span> <span class='s'>"hazard"</span>, </span> <span> eval_time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0.1</span>, <span class='m'>8</span>, by <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>mutate</span><span class='o'>(</span>model <span class='o'>=</span> <span class='s'>"gengamma"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>bind_cols</span><span class='o'>(</span><span class='nv'>bc_groups</span><span class='o'>)</span></span></code></pre> </div> <p>Plotting the predictions of both models shows a lot more flexibility in the splines model.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bind_rows</span><span class='o'>(</span><span class='nv'>pred_splines</span>, <span class='nv'>pred_gengamma</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'>mutate</span><span class='o'>(</span>group <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>group</span>, levels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Poor"</span>,<span class='s'>"Medium"</span>,<span class='s'>"Good"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/unnest.html'>unnest</a></span><span class='o'>(</span>cols <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'>ggplot</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'>geom_line</span><span class='o'>(</span><span class='nf'>aes</span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>.eval_time</span>, y <span class='o'>=</span> <span class='nv'>.pred_hazard</span>, group <span class='o'>=</span> <span class='nv'>group</span>, col <span class='o'>=</span> <span class='nv'>group</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'>facet_wrap</span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>model</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" alt="Two panels side by side, showing the predicted hazard curves for the three prognostic groups from the parametric model on the left and the spline model on the right. The curves for the spline model show more wiggliness, having more flexibility to adapt to the data than the curves from the parametric model which have to follow a generalized gamma distribution." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Special thanks to Matt Warkentin and Byron Jaeger for the new engines! A big thank you to all the people who have contributed to censored since the release of v0.1.0:</p> <p> <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/therneau" target="_blank" rel="noopener">@therneau</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> Writing performant code with tidy tools https://www.tidyverse.org/blog/2023/04/performant-packages/ Tue, 18 Apr 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/04/performant-packages/ <p>The tidyverse packages provide safe, powerful, and expressive interfaces to solve data science problems. Behind the scenes of the tidyverse is a set of lower-level tools that its developers use to build these interfaces. While these lower-level approaches are more performant than their tidy analogues, their interfaces are often less readable and safe. For most use cases in interactive data analysis, the advantages of tidyverse interfaces far outweigh the drawback in computational speed. When speed becomes an issue, though, transitioning tidy code to use these lower-level interfaces in their backend can offer substantial increases in computational performance.</p> <p>This post will outline alternatives to tools I love from packages like dplyr and tidyr that I use to speed up computational bottlenecks. These recommendations come from my experiences developing the <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> packages, a collection of packages for modeling and machine learning using tidyverse principles. As such, most of these suggestions are best suited to package code, as the noted trade-off is more likely to be worth it in those settings&mdash;however, there may also be cases in analytical code, especially in production and/or with very large data sets, where these tips will be helpful. I&rsquo;ve included a number of &ldquo;worked examples&rdquo; with each proposed alternative, showing how the tidymodels team has used these same tricks to <a href="https://www.simonpcouch.com/blog/speedups-2023/" target="_blank" rel="noopener">speed up our code</a> quite a bit. Before I do that, though, let&rsquo;s make friends with some new R packages.</p> <h2 id="tools-of-the-trade">Tools of the trade <a href="#tools-of-the-trade"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>First, loading the tidyverse:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span></code></pre> </div> <p>The most important tools to help you understand what&rsquo;s slowing your code down have little to do with the tidyverse at all!</p> <h3 id="profvis">profvis <a href="#profvis"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The profvis package is an R package for collecting and visualizing profiling data.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rstudio.github.io/profvis/'>profvis</a></span><span class='o'>)</span></span></code></pre> </div> <p>Profiling is the process of determining how long different portions of a chunk of code take to run. For example, in this next function <code>slow_function()</code>, it&rsquo;s somewhat straightforward to tell how long different portions of the following code run for if you know what <a href="https://rdrr.io/pkg/profvis/man/pause.html" target="_blank" rel="noopener"><code>pause()</code></a> does. ( <a href="https://rdrr.io/pkg/profvis/man/pause.html" target="_blank" rel="noopener"><code>pause()</code></a> is a function from the profvis package that just chills out for the specified amount of time. For example, <code>pause(1)</code> will wait for 1 second before finishing running.)</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>step_1</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/pkg/profvis/man/pause.html'>pause</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nv'>step_2</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://rdrr.io/pkg/profvis/man/pause.html'>pause</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nv'>slow_function</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'>step_1</span><span class='o'>(</span><span class='o'>)</span></span> <span> </span> <span> <span class='nf'>step_2</span><span class='o'>(</span><span class='o'>)</span></span> <span> </span> <span> <span class='kc'>TRUE</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <p>Profiling tools would help us see that <code>step_1()</code> takes one second, while <code>step_2()</code> takes two. In practice, this is usually much harder to intuit visually. To profile code with profvis, use the <a href="https://rdrr.io/pkg/profvis/man/profvis.html" target="_blank" rel="noopener"><code>profvis()</code></a> function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>result</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/pkg/profvis/man/profvis.html'>profvis</a></span><span class='o'>(</span><span class='nf'>slow_function</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> <p>Printing the <code>result</code>ing object out will visualize the time different calls within <code>slow_function()</code> took:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>result</span></span></code></pre> </div> <p><img src="slow-function-profvis.png" alt="A screenshot of profvis output. A stack of grey bars sit atop a timeline that ranges from zero to three seconds. The bottom rectangle of the stack is labeled &ldquo;slow_function&rdquo; and stretches across the whole timeline. Two rectangles labeled &ldquo;step_1&rdquo; and &ldquo;step_2&rdquo; lie on top of the bottom rectangle, where the first stretches one-third of the way across the timeline and the second covers the remaining two-thirds."></p> <p>This output shows that, inside of <code>slow_function()</code>, <code>step_1()</code> took about a third of the total time and <code>step_2()</code> took two-thirds. All of the time in both of those functions was due to calling <a href="https://rdrr.io/pkg/profvis/man/pause.html" target="_blank" rel="noopener"><code>pause()</code></a>.</p> <p>Profiling should be your first line of defense against slow-running code. Often, profiling will surface slowdowns in unexpected places, and solutions to address those slowdowns may have little to do with usage of tidy tools. To learn more about profiling, the <a href="https://adv-r.hadley.nz/perf-measure.html" target="_blank" rel="noopener">Measuring performance</a> chapter in Hadley Wickham&rsquo;s book <a href="https://adv-r.hadley.nz/index.html" target="_blank" rel="noopener">Advanced R</a> is a great place to start.</p> <h3 id="bench">bench <a href="#bench"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>profvis is a powerful tool to surface code slowdowns. Often, though, it may not be immediately clear how to <em>fix</em> that slowdown. The bench package allows users to quickly test out how long different approaches to solving a problem take.</p> <p>For example, say we want to take the sum of the numbers in a list, but we&rsquo;ve identified via profiling that this operation is slowing our code down:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>numbers</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>as.list</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>5</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>numbers</span></span> <span><span class='c'>#&gt; [[1]]</span></span> <span><span class='c'>#&gt; [1] 1</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[2]]</span></span> <span><span class='c'>#&gt; [1] 2</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[3]]</span></span> <span><span class='c'>#&gt; [1] 3</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[4]]</span></span> <span><span class='c'>#&gt; [1] 4</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[5]]</span></span> <span><span class='c'>#&gt; [1] 5</span></span> <span></span></code></pre> </div> <p>One approach could be using the <a href="https://rdrr.io/r/base/funprog.html" target="_blank" rel="noopener"><code>Reduce()</code></a> function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/funprog.html'>Reduce</a></span><span class='o'>(</span><span class='nv'>sum</span>, <span class='nv'>numbers</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 15</span></span> <span></span></code></pre> </div> <p>Another could involve converting to a vector with <a href="https://rdrr.io/r/base/unlist.html" target="_blank" rel="noopener"><code>unlist()</code></a> and then using <a href="https://rdrr.io/r/base/sum.html" target="_blank" rel="noopener"><code>sum()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/unlist.html'>unlist</a></span><span class='o'>(</span><span class='nv'>numbers</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 15</span></span> <span></span></code></pre> </div> <p>You may have some other ideas of how to solve this problem! How do we figure out which one is fastest, though? The <a href="http://bench.r-lib.org/reference/mark.html" target="_blank" rel="noopener"><code>bench::mark()</code></a> function from bench takes in different proposals to solve the same problem and returns a tibble with information about how long they took (among other things.)</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>res</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> approach_1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/funprog.html'>Reduce</a></span><span class='o'>(</span><span class='nv'>sum</span>, <span class='nv'>numbers</span><span class='o'>)</span>,</span> <span> approach_2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/unlist.html'>unlist</a></span><span class='o'>(</span><span class='nv'>numbers</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nv'>res</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> approach_1 2.25µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> approach_2 491.97ns</span></span> <span></span></code></pre> </div> <p>The other nice part about <a href="http://bench.r-lib.org/reference/mark.html" target="_blank" rel="noopener"><code>bench::mark()</code></a> is that it will check that each approach gives the same output, so that you don&rsquo;t mistakenly compare apples and oranges.</p> <p>There are two important lessons to take in from this output:</p> <ul> <li>The <code>sum(unlist())</code> approach was wicked fast compared to <a href="https://rdrr.io/r/base/funprog.html" target="_blank" rel="noopener"><code>Reduce()</code></a>.</li> <li>Both of these expressions were fast. Even the slower of the two took 2.25µs&mdash;to put that in perspective, that expression could complete 443454 iterations in a second! Keeping this bigger picture in mind is always important when benchmarking; if code runs fast enough to not be an issue in practical situations, then it need not be optimized in favor of less readable or safe code.</li> </ul> <p>The results of little experiments like this one can be surprising at first. Over time, though, you will develop intuition for the fastest way to solve problems you commonly solve, and will write fast code the first time around!</p> <p>In this case, using <a href="https://rdrr.io/r/base/funprog.html" target="_blank" rel="noopener"><code>Reduce()</code></a> means calling <a href="https://rdrr.io/r/base/sum.html" target="_blank" rel="noopener"><code>sum()</code></a> many times, approximately once for each element of the list, and while <a href="https://rdrr.io/r/base/sum.html" target="_blank" rel="noopener"><code>sum()</code></a> isn&rsquo;t particularly slow, calling an R function many times tends to have non-negligible overhead. With the <code>sum(unlist())</code> approach, there are only 2 R function calls&mdash;one for <a href="https://rdrr.io/r/base/unlist.html" target="_blank" rel="noopener"><code>unlist()</code></a> and one for <a href="https://rdrr.io/r/base/sum.html" target="_blank" rel="noopener"><code>sum()</code></a>&mdash;which both immediately drop into C code.</p> <h3 id="vctrs">vctrs <a href="#vctrs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The problems I commonly solve&mdash;and possibly you as well, as a reader of this post&mdash;often involve lots of dplyr and tidyr. When profiling the tidymodels packages, I&rsquo;ve come across many places where calls to dplyr and tidyr took more time than I&rsquo;d like them to, but had a lot to learn about how to speed up those operations. <em>Enter the vctrs package!</em></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://vctrs.r-lib.org/'>vctrs</a></span><span class='o'>)</span></span></code></pre> </div> <p>If you use dplyr and tidyr like I do, turns out you&rsquo;re also a vctrs user! dplyr and tidyr rely on vctrs to handle all sorts of elementary operations behind the scenes, and the package is a core part of a tidy developer&rsquo;s toolkit. Taken together with some functions from the tibble package, these tools provide a super efficient, albeit bare-bones, alternative interface to common data manipulation tasks like <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>ing and <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a>ing.</p> <h2 id="rewriting-tidy-code">Rewriting tidy code <a href="#rewriting-tidy-code"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For every performance improvement I make by rewriting dplyr and tidyr code to instead use vctrs and tibble, I make probably two or three simpler optimizations. <a href="https://adv-r.hadley.nz/perf-improve.html" target="_blank" rel="noopener">Tool-agnostic practices</a> such as reducing duplicated computations, implementing early returns where possible, and using vectorized implementations will likely take you far when optimizing R code. Profiling is your ground truth! When profiling indicates that otherwise well-factored code is slowed by tidy interfaces, though, all is not lost.</p> <p>We&rsquo;ll demonstrate different ways to speed up tidy code using a version of the base R data frame <code>mtcars</code> converted to a tibble:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/as_tibble.html'>as_tibble</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, rownames <span class='o'>=</span> <span class='s'>"make_model"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>mtcars_tbl</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 32 × 12</span></span></span> <span><span class='c'>#&gt; make_model mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Mazda RX4 … 21 6 160 110 3.9 2.88 17.0 0 1 4 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Hornet 4 D… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Hornet Spo… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 22 more rows</span></span></span> <span></span></code></pre> </div> <h3 id="one-for-one-replacements">One-for-one replacements <a href="#one-for-one-replacements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Many of the core functions in dplyr have alternatives in vctrs and tibble that can be quickly transitioned. There are a couple considerations associated with each, though, and some of them make piping a bit more awkward&mdash;most of the time, when I switch these out, I remove the pipe <code>%&gt;%</code> as well.</p> <h4 id="filter"><code>filter()</code> <a href="#filter"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>The dplyr code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span><span class='o'>)</span></span></code></pre> </div> <p>&hellip;can be replaced by:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_slice.html'>vec_slice</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span><span class='o'>)</span></span></code></pre> </div> <p>Note that the second argument that determines which rows to keep requires you to actually pass the column <code>mtcars_tbl$hp</code> rather than its reference <code>hp</code>. If you feel cozier with square brackets, you can also use <code>[.tbl_df</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span>, <span class='o'>]</span></span></code></pre> </div> <p><code>[.tbl_df</code> is the <a href="https://tibble.tidyverse.org/reference/subsetting.html" target="_blank" rel="noopener">method for subsetting with a single square bracket when applied to tibbles</a>. Tibbles have their own methods for extracting and replacing subsets of data frames. They generally behave similarly to the analogous methods for <code>data.frame</code>s, but have small differences to improve consistency and safety.</p> <p>The benchmarks for these different approaches are:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>res</span> <span class='o'>&lt;-</span></span> <span> <span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span><span class='o'>)</span>,</span> <span> vctrs <span class='o'>=</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_slice.html'>vec_slice</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span><span class='o'>)</span>,</span> <span> `[.tbl_df` <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span> <span class='o'>&gt;</span> <span class='m'>100</span>, <span class='o'>]</span></span> <span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>res</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 289.93µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> vctrs 4.63µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> [.tbl_df 23.74µs</span></span> <span></span></code></pre> </div> <p>The bigger picture of benchmarking is worth re-iterating here. While the <code>filter()</code> approach was by far the slowest expression of the three, it still only took 290µs&mdash;able to complete 3449 iterations in a second. If I&rsquo;m interactively analyzing data, I won&rsquo;t even notice the difference in evaluation time between these expressions, let alone care about it; the benefits of expressiveness and safety that <code>filter()</code> provide far outweigh the drawback of this slowdown. If <code>filter()</code> is called 3449 times in the backend of a machine learning pipeline, though, these alternatives may be worth transitioning to.</p> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/935" target="_blank" rel="noopener">tidymodels/parsnip#935</a>, <a href="https://github.com/tidymodels/parsnip/pull/933" target="_blank" rel="noopener">tidymodels/parsnip#933</a>, <a href="https://github.com/tidymodels/parsnip/pull/901" target="_blank" rel="noopener">tidymodels/parsnip#901</a>.</p> <h4 id="mutate"><code>mutate()</code> <a href="#mutate"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>The dplyr code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, year <span class='o'>=</span> <span class='m'>1974L</span><span class='o'>)</span></span></code></pre> </div> <p>&hellip;can be replaced by:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>year</span> <span class='o'>&lt;-</span> <span class='m'>1974L</span></span></code></pre> </div> <p>&hellip;with benchmarks:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, year <span class='o'>=</span> <span class='m'>1974L</span><span class='o'>)</span>,</span> <span> `$&lt;-.tbl_df` <span class='o'>=</span> <span class='o'>&#123;</span><span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>year</span> <span class='o'>&lt;-</span> <span class='m'>1974L</span>; <span class='nv'>mtcars_tbl</span><span class='o'>&#125;</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 302.5µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> $&lt;-.tbl_df 12.8µs</span></span> <span></span></code></pre> </div> <p>By default, both <code>mutate()</code> and <code>$&lt;-.tbl_df</code> append the new column at the right-most position. The <code>.before</code> and <code>.after</code> arguments to <code>mutate()</code> are a really nice interface to adjust that behavior, and I miss it often when using <code>$&lt;-.tbl_df</code>. In those cases, <code>select()</code> and its alternatives (see next section!) can be helpful.</p> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/933" target="_blank" rel="noopener">tidymodels/parsnip#933</a>, <a href="https://github.com/tidymodels/parsnip/pull/921" target="_blank" rel="noopener">tidymodels/parsnip#921</a>, and <a href="https://github.com/tidymodels/parsnip/pull/901" target="_blank" rel="noopener">tidymodels/parsnip#901</a>.</p> <h4 id="select"><code>select()</code> <a href="#select"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>The dplyr code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>hp</span><span class='o'>)</span></span></code></pre> </div> <p>&hellip;can be replaced by:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='s'>"hp"</span><span class='o'>]</span></span></code></pre> </div> <p>&hellip;with benchmarks:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>hp</span><span class='o'>)</span>,</span> <span> `[.tbl_df` <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='s'>"hp"</span><span class='o'>]</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 527.01µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> [.tbl_df 8.08µs</span></span> <span></span></code></pre> </div> <p>Of course, the nice part about <code>select()</code>, and something we make use of in tidymodels quite a bit, is tidyselect. I&rsquo;ve often found that we lean heavily on selecting via external vectors, i.e. character vectors, i.e. things that can be inputted to <code>[.tbl_df</code> directly. That is:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>cols</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"hp"</span>, <span class='s'>"wt"</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>all_of</a></span><span class='o'>(</span><span class='nv'>cols</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> `[.tbl_df` <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nv'>cols</span><span class='o'>]</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 548.74µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> [.tbl_df 8.53µs</span></span> <span></span></code></pre> </div> <p>Note that <code>[.tbl_df</code> always sets <code>drop = FALSE</code>.</p> <p><code>[.tbl_df</code> can also be used as an alternative interface to <code>select()</code> or <code>relocate()</code> with a <code>.before</code> or <code>.after</code> argument. For instance, to place that column <code>year</code> we made in the last section as the second column, we could write:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>left_cols</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"make_model"</span>, <span class='s'>"year"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>mtcars_tbl</span><span class='o'>[</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>left_cols</span>, </span> <span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>setdiff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span>, <span class='nv'>left_cols</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>]</span></span></code></pre> </div> <p>No, thanks, but it is a good bit faster than tidyselect-based alternatives:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> mutate <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, year <span class='o'>=</span> <span class='m'>1974L</span>, .after <span class='o'>=</span> <span class='nv'>make_model</span><span class='o'>)</span>,</span> <span> relocate <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/relocate.html'>relocate</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>year</span>, .after <span class='o'>=</span> <span class='nv'>make_model</span><span class='o'>)</span>,</span> <span> `[.tbl_df` <span class='o'>=</span> </span> <span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>left_cols</span>, </span> <span> <span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='nv'>left_cols</span><span class='o'>]</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span> <span class='o'>]</span>,</span> <span> check <span class='o'>=</span> <span class='kc'>FALSE</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> mutate 1.2ms</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> relocate 804.3µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> [.tbl_df 19.1µs</span></span> <span></span></code></pre> </div> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/935" target="_blank" rel="noopener">tidymodels/parsnip#935</a>, <a href="https://github.com/tidymodels/parsnip/pull/933" target="_blank" rel="noopener">tidymodels/parsnip#933</a>, <a href="https://github.com/tidymodels/parsnip/pull/921" target="_blank" rel="noopener">tidymodels/parsnip#921</a>, and <a href="https://github.com/tidymodels/tune/pull/635" target="_blank" rel="noopener">tidymodels/tune#635</a>.</p> <h4 id="pull"><code>pull()</code> <a href="#pull"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>The dplyr code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/pull.html'>pull</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>hp</span><span class='o'>)</span></span></code></pre> </div> <p>&hellip;can be replaced by:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span></span></code></pre> </div> <p>&hellip;or:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars_tbl</span><span class='o'>[[</span><span class='s'>"hp"</span><span class='o'>]</span><span class='o'>]</span></span></code></pre> </div> <p>Note that, for tibbles, <code>$</code> will raise a warning if the subsetted column doesn&rsquo;t exist, while <code>[[</code> will silently return <code>NULL</code>.</p> <p>With benchmarks:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/pull.html'>pull</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>hp</span><span class='o'>)</span>,</span> <span> `$.tbl_df` <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>hp</span>,</span> <span> `[[.tbl_df` <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[[</span><span class='s'>"hp"</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 101.19µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> $.tbl_df 615.02ns</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> [[.tbl_df 2.25µs</span></span> <span></span></code></pre> </div> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/935" target="_blank" rel="noopener">tidymodels/parsnip#935</a> and <a href="https://github.com/tidymodels/tune/pull/635" target="_blank" rel="noopener">tidymodels/tune#635</a>.</p> <h4 id="bind_"><code>bind_*()</code> <a href="#bind_"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p><code>bind_rows()</code> and <code>bind_cols()</code> can be substituted for <code>vec_rbind()</code> and <code>vec_cbind()</code>, respectively. First, row-binding:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/bind_rows.html'>bind_rows</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>mtcars_tbl</span><span class='o'>)</span>,</span> <span> vctrs <span class='o'>=</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_bind.html'>vec_rbind</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>mtcars_tbl</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 44µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> vctrs 14.3µs</span></span> <span></span></code></pre> </div> <p>As for column-binding:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tbl</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>year <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='m'>1974L</span>, <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> dplyr <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/bind_cols.html'>bind_cols</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>tbl</span><span class='o'>)</span>,</span> <span> vctrs <span class='o'>=</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_bind.html'>vec_cbind</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>tbl</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> dplyr 60.7µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> vctrs 26.2µs</span></span> <span></span></code></pre> </div> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/tune/pull/636" target="_blank" rel="noopener">tidymodels/tune#636</a>.</p> <h4 id="grouping">Grouping <a href="#grouping"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>In general, the introduction of groups makes these substitutions much trickier. In those cases, it&rsquo;s likely best to weigh (via profiling) how significant the slowdown is and, if it&rsquo;s not too bad, opt not to make any changes. For code that relies on <code>group_by()</code> and sees heavy traffic, see <code>vctrs::list_unchop()</code>, <code>vctrs::vec_chop()</code>, and <code>vctrs::vec_rep_each()</code>.</p> <h3 id="tibbles">Tibbles <a href="#tibbles"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Tibbles are great, and I don&rsquo;t want to interface with any other data frame-y thing. Some notes:</p> <ul> <li> <p><code>as_tibble()</code> on a tibble is not &ldquo;free&rdquo;:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> on_tbl_df <span class='o'>=</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/as_tibble.html'>as_tibble</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span>,</span> <span> on_data.frame <span class='o'>=</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/as_tibble.html'>as_tibble</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, rownames <span class='o'>=</span> <span class='s'>"make_model"</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> on_tbl_df 51.2µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> on_data.frame 238.6µs</span></span> <span></span></code></pre> </div> <p>Note that the time to coerce data frames and tibbles doesn&rsquo;t depend on the size of the data being coerced, in most situations.</p> </li> <li> <p>Building a tibble from scratch using <code>tibble()</code> actually takes quite a while as well. <code>tibble()</code> handles vector recycling and name checking, builds columns sequentially, all that good stuff. If you need that, use <code>tibble()</code>, but if you&rsquo;re building a tibble from well-understood inputs, use <code>new_tibble()</code>, which minimizes validation checks. For a middle ground between <code>tibble()</code> and <code>new_tibble(list())</code> in terms of both performance and safety, use the <code>df_list()</code> function from the vctrs package in place of <code>list()</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> tibble <span class='o'>=</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>, b <span class='o'>=</span> <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span><span class='o'>)</span>,</span> <span> new_tibble_df_list <span class='o'>=</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/new_tibble.html'>new_tibble</a></span><span class='o'>(</span><span class='nf'><a href='https://vctrs.r-lib.org/reference/df_list.html'>df_list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>, b <span class='o'>=</span> <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span><span class='o'>)</span>, nrow <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span>,</span> <span> new_tibble_list <span class='o'>=</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/new_tibble.html'>new_tibble</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>, b <span class='o'>=</span> <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span><span class='o'>)</span>, nrow <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> tibble 165.97µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> new_tibble_df_list 16.69µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> new_tibble_list 4.96µs</span></span> <span></span></code></pre> </div> <p>Note that <code>new_tibble()</code> <em>will not check the lengths of its inputs.</em> Carry out simple recycling yourself, and be sure to use the <code>nrow</code> argument to get basic length checks.</p> </li> </ul> <p>Some examples of changes like this made to tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/932" target="_blank" rel="noopener">tidymodels/parsnip#945</a>, <a href="https://github.com/tidymodels/parsnip/pull/934" target="_blank" rel="noopener">tidymodels/parsnip#934</a>, <a href="https://github.com/tidymodels/parsnip/pull/929" target="_blank" rel="noopener">tidymodels/parsnip#929</a>, <a href="https://github.com/tidymodels/parsnip/pull/923" target="_blank" rel="noopener">tidymodels/parsnip#923</a>, <a href="https://github.com/tidymodels/parsnip/pull/902" target="_blank" rel="noopener">tidymodels/parsnip#902</a>, <a href="https://github.com/tidymodels/dials/pull/277" target="_blank" rel="noopener">tidymodels/dials#277</a>, and <a href="https://github.com/tidymodels/tune/pull/637" target="_blank" rel="noopener">tidymodels/tune#637</a>.</p> <h3 id="becoming-join-critical">Becoming join-critical <a href="#becoming-join-critical"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Two truths:</p> <ul> <li> <p>dplyr joins are a remarkably safe and powerful way to synthesize data sources.</p> </li> <li> <p>One ought to ask themselves &ldquo;does this really need to be a join?&rdquo; when combining data sources in package code.</p> </li> </ul> <p>Some ways to intuit about join efficiency:</p> <ul> <li> <p>If this join happens multiple times, is it possible to express it as one join and then subset it when needed? i.e. if a join happens inside of a loop but the elements of the join are not indices of the loop, it&rsquo;s likely possible to pull that join outside of the loop and then <code>vec_slice()</code> its results inside of the loop.</p> </li> <li> <p>Am I using the complete outputted join result or just a portion? If I end up only making use of column names, or values in one column (as with joins approximating <a href="https://adv-r.hadley.nz/subsetting.html?q=lookup#lookup-tables" target="_blank" rel="noopener">lookup tables</a>), or pairings between two columns, I may be able to instead use <code>$.tbl_df</code> or <code>[.tbl_df</code>.</p> </li> </ul> <p>As an example, imagine we have another tibble that tells us additional information about the <code>make_model</code>s that I&rsquo;ve driven:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>my_cars</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> make_model <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Honda Civic"</span>, <span class='s'>"Subaru Forester"</span><span class='o'>)</span>,</span> <span> color <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Grey"</span>, <span class='s'>"White"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nv'>my_cars</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; make_model color</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Honda Civic Grey </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Subaru Forester White</span></span> <span></span></code></pre> </div> <p>I <em>could</em> use a join to subset down to cars in <code>mtcars_tbl</code> and add this information on the cars I&rsquo;ve driven:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>my_cars</span>, <span class='s'>"make_model"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 13</span></span></span> <span><span class='c'>#&gt; make_model mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Honda Civic 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: color &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <p>Another way to express this, though, if I can safely assume that each of my cars would have only one or zero matches in <code>mtcars_tbl</code>, is to find entries in <code>mtcars_tbl$make_model</code> that match entries in <code>my_cars$make_model</code>, subset down to those matches, and then bind columns:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>supplement_my_cars</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='c'># locate matches, assuming only 0 or 1 matches possible</span></span> <span> <span class='nv'>loc</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_match.html'>vec_match</a></span><span class='o'>(</span><span class='nv'>my_cars</span><span class='o'>$</span><span class='nv'>make_model</span>, <span class='nv'>mtcars_tbl</span><span class='o'>$</span><span class='nv'>make_model</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># keep only the matches</span></span> <span> <span class='nv'>loc_mine</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/which.html'>which</a></span><span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>loc</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='nv'>loc_mtcars</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_slice.html'>vec_slice</a></span><span class='o'>(</span><span class='nv'>loc</span>, <span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>loc</span><span class='o'>)</span><span class='o'>)</span></span> <span> </span> <span> <span class='c'># drop duplicated join column</span></span> <span> <span class='nv'>my_cars_join</span> <span class='o'>&lt;-</span> <span class='nv'>my_cars</span><span class='o'>[</span><span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>setdiff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/names.html'>names</a></span><span class='o'>(</span><span class='nv'>my_cars</span><span class='o'>)</span>, <span class='s'>"make_model"</span><span class='o'>)</span><span class='o'>]</span></span> <span></span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_bind.html'>vec_cbind</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_slice.html'>vec_slice</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>loc_mtcars</span><span class='o'>)</span>,</span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_slice.html'>vec_slice</a></span><span class='o'>(</span><span class='nv'>my_cars_join</span>, <span class='nv'>loc_mine</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nf'>supplement_my_cars</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 13</span></span></span> <span><span class='c'>#&gt; make_model mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Honda Civic 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># ℹ 1 more variable: color &lt;chr&gt;</span></span></span> <span></span></code></pre> </div> <p>This is indeed quite a bit faster:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> inner_join <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, <span class='nv'>my_cars</span>, <span class='s'>"make_model"</span><span class='o'>)</span>,</span> <span> manual <span class='o'>=</span> <span class='nf'>supplement_my_cars</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> inner_join 438µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> manual 50.7µs</span></span> <span></span></code></pre> </div> <p>At the same time, if either of these problems were even a little bit more complex, e.g. if there were possibly multiple matching <code>make_models</code> in <code>mtcars_tbl</code> or if I wanted to keep all rows in <code>mtcars_tbl</code> regardless of whether I had driven the car, then expressing this join with more bare-bones operations quickly becomes less readable and more error-prone. In those cases, too, joins in dplyr have a relatively small amount of overhead when compared to the vctrs backends underlying them. So, optimize carefully!</p> <p>Some examples of writing out joins in tidymodels packages: <a href="https://github.com/tidymodels/parsnip/pull/932" target="_blank" rel="noopener">tidymodels/parsnip#932</a>, <a href="https://github.com/tidymodels/parsnip/pull/931" target="_blank" rel="noopener">tidymodels/parsnip#931</a>, <a href="https://github.com/tidymodels/parsnip/pull/921" target="_blank" rel="noopener">tidymodels/parsnip#921</a>, and <a href="https://github.com/tidymodels/recipes/pull/1121" target="_blank" rel="noopener">tidymodels/recipes#1121</a>.</p> <h3 id="nest"><code>nest()</code> <a href="#nest"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p><code>nest()</code>s are subject to similar considerations as joins. When they allow for expressive or principled user interfaces, use them, but manipulate them sparingly in backends. Writing out <code>nest()</code> calls <em>can</em> result in substantial speedups, though, and the process is not quite as gnarly as writing out a join. For code that relies on <code>nest()</code>s and sees heavy traffic, rewriting with vctrs may be worth the effort.</p> <p>For example, consider nesting <code>mtcars_tbl</code> by <code>cyl</code> and <code>am</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>cyl</span>, <span class='nv'>am</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; cyl am data </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 6 1 <span style='color: #555555;'>&lt;tibble [3 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 4 1 <span style='color: #555555;'>&lt;tibble [8 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 6 0 <span style='color: #555555;'>&lt;tibble [4 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 8 0 <span style='color: #555555;'>&lt;tibble [12 × 10]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 4 0 <span style='color: #555555;'>&lt;tibble [3 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 8 1 <span style='color: #555555;'>&lt;tibble [2 × 10]&gt;</span></span></span> <span></span></code></pre> </div> <p>For some basic nests, <code>vec_split()</code> can do the trick.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>nest_cols</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cyl"</span>, <span class='s'>"am"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_split.html'>vec_split</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>setdiff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span>, <span class='nv'>nest_cols</span><span class='o'>)</span><span class='o'>]</span>,</span> <span> by <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nv'>nest_cols</span><span class='o'>]</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_bind.html'>vec_cbind</a></span><span class='o'>(</span><span class='nv'>res</span><span class='o'>$</span><span class='nv'>key</span>, <span class='nf'><a href='https://tibble.tidyverse.org/reference/new_tibble.html'>new_tibble</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>res</span><span class='o'>$</span><span class='nv'>val</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; cyl am data </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 6 1 <span style='color: #555555;'>&lt;tibble [3 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 4 1 <span style='color: #555555;'>&lt;tibble [8 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 6 0 <span style='color: #555555;'>&lt;tibble [4 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 8 0 <span style='color: #555555;'>&lt;tibble [12 × 10]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 4 0 <span style='color: #555555;'>&lt;tibble [3 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 8 1 <span style='color: #555555;'>&lt;tibble [2 × 10]&gt;</span></span></span> <span></span></code></pre> </div> <p>The performance improvement in these situations can be quite substantial:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> nest <span class='o'>=</span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>cyl</span>, <span class='nv'>am</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> vctrs <span class='o'>=</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>res</span> <span class='o'>&lt;-</span> </span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_split.html'>vec_split</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>setdiff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/colnames.html'>colnames</a></span><span class='o'>(</span><span class='nv'>mtcars_tbl</span><span class='o'>)</span>, <span class='nv'>nest_cols</span><span class='o'>)</span><span class='o'>]</span>,</span> <span> by <span class='o'>=</span> <span class='nv'>mtcars_tbl</span><span class='o'>[</span><span class='nv'>nest_cols</span><span class='o'>]</span></span> <span> <span class='o'>)</span></span> <span> </span> <span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_bind.html'>vec_cbind</a></span><span class='o'>(</span><span class='nv'>res</span><span class='o'>$</span><span class='nv'>key</span>, <span class='nf'><a href='https://tibble.tidyverse.org/reference/new_tibble.html'>new_tibble</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>res</span><span class='o'>$</span><span class='nv'>val</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>&#125;</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> nest 1.81ms</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> vctrs 67.61µs</span></span> <span></span></code></pre> </div> <p>More complex nests require a good bit of facility with the vctrs package. <code>vec_split()</code>, <code>list_unchop()</code>, and <code>vec_chop()</code> are all good places to start, and these examples of writing out nests in tidymodels packages make use of other vctrs patterns: <a href="https://github.com/tidymodels/tune/pull/657" target="_blank" rel="noopener">tidymodels/tune#657</a>, <a href="https://github.com/tidymodels/tune/pull/656" target="_blank" rel="noopener">tidymodels/tune#657</a>, <a href="https://github.com/tidymodels/tune/pull/640" target="_blank" rel="noopener">tidymodels/tune#640</a>, and <a href="https://github.com/tidymodels/recipes/pull/1121" target="_blank" rel="noopener">tidymodels/recipes#1121</a>.</p> <h3 id="combining-strings">Combining strings <a href="#combining-strings"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The glue package is super helpful for writing expressive and correct strings with data, though it is quite a bit slower than <code>paste0()</code>. At the same time, <code>paste0()</code> has some tricky recycling behavior. For a middle ground in terms of both performance and safety, this short wrapper has been quite helpful:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vec_paste0</span> <span class='o'>&lt;-</span> <span class='kr'>function</span> <span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>args</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://vctrs.r-lib.org/reference/vec_recycle.html'>vec_recycle_common</a></span><span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rlang.r-lib.org/reference/exec.html'>exec</a></span><span class='o'>(</span><span class='nv'>paste0</span>, <span class='o'>!</span><span class='o'>!</span><span class='o'>!</span><span class='nv'>args</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>name</span> <span class='o'>&lt;-</span> <span class='s'>"Simon"</span></span> <span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span></span> <span> glue <span class='o'>=</span> <span class='nf'>glue</span><span class='nf'>::</span><span class='nf'><a href='https://glue.tidyverse.org/reference/glue.html'>glue</a></span><span class='o'>(</span><span class='s'>"My name is &#123;name&#125;."</span><span class='o'>)</span>,</span> <span> vec_paste0 <span class='o'>=</span> <span class='nf'>vec_paste0</span><span class='o'>(</span><span class='s'>"My name is "</span>, <span class='nv'>name</span>, <span class='s'>"."</span><span class='o'>)</span>,</span> <span> paste0 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='s'>"My name is "</span>, <span class='nv'>name</span>, <span class='s'>"."</span><span class='o'>)</span>,</span> <span> check <span class='o'>=</span> <span class='kc'>FALSE</span></span> <span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>expression</span>, <span class='nv'>median</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; expression median</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;bch:expr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;bch:tm&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> glue 38.99µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> vec_paste0 3.98µs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> paste0 861.01ns</span></span> <span></span></code></pre> </div> <p>My rule of thumb is to use <code>glue()</code> for errors, when the function will stop executing anyway. For simple pastes that are intended to be called repeatedly, use <code>vec_paste0()</code>. There&rsquo;s a lot of gray area in between those two contexts&mdash;intuit (or profile) as you will.</p> <h2 id="wrapping-up">Wrapping up <a href="#wrapping-up"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This post contains a number of tricks that offer especially performant alternatives to interfaces from dplyr and tidyr. Making use of these backend tools is certainly a trade-off; what is gained in computational performance is also offset by a decline in readability and safety, so developers ought to consider carefully when optimizations are worth the effort and risk.</p> <p>Thanks to Davis Vaughan for the guidance in getting started with vctrs. Also, thanks to both Davis Vaughan and Lionel Henry for their efforts in helping the tidymodels team address the bottlenecks that have been surfaced by our work on optimizations in tidyverse packages.</p> New CRAN requirements for packages with C and C++ https://www.tidyverse.org/blog/2023/03/cran-checks-compiled-code/ Thu, 30 Mar 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/03/cran-checks-compiled-code/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] `Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>The R package landscape is dynamic, with changes in infrastructure common, especially when CRAN makes changes to their policies and requirements. This is particularly true for packages that include low-level compiled code, requiring developers to be nimble in responding to these changes.</p> <p>The tidyverse team at Posit is in the unique situation where we have a concentration of developers working full-time on creating and maintaining open source packages. This internal community provides the opportunity to collaborate to develop shared practices and discover solutions to problems that arise. When we can, we like to share what we&rsquo;ve learned so other developers can benefit.</p> <p>There have been a few recent changes at CRAN for packages containing C and C++ code that developers have had to adapt to, and we would like to share some of our learning:</p> <h2 id="note-regarding-systemrequirements-c11">NOTE regarding <code>SystemRequirements: C++11</code> <a href="#note-regarding-systemrequirements-c11"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many package authors might have noticed a new NOTE on R-devel when submitting a package to CRAN containing C++ code:</p> <pre><code>* checking C++ specification ... NOTE Specified C++11: please drop specification unless essential </code></pre> <p>This NOTE is now appearing during <code>R CMD check</code> on R-devel for packages where the DESCRIPTION file has the following:</p> <pre><code>SystemRequirements: C++11 </code></pre> <p>Packages that use C++11 would also usually have set <code>CXX_STD=CXX11</code> in the <code>src/Makevars</code> and <code>src/Makevars.win</code> files (and <code>src/Makevars.ucrt</code>, if present). These specifications tell R to use the C++11 standard when compiling the code.</p> <p>To understand the NOTE, a bit of history will be helpful (thanks to Winston Chang for <a href="https://gist.github.com/wch/849ca79c9416795d99c48cc06a44ca1e" target="_blank" rel="noopener">writing this up</a>):</p> <ul> <li>In R 3.5 and below, on systems with an old compiler, R would default to using the C++98 standard when compiling the code. If a package needed a C++11 compiler, the DESCRIPTION file needed to have <code>SystemRequirements: C++11</code>, and the various <code>src/Makevars*</code> files needed to set <code>CXX_STD=CXX11</code>.</li> <li>In R 3.6.2, R began defaulting to compiling packages with the C++11 standard, as long as the compiler supported C++11 (which was true on most systems).</li> <li>In R 4.0, C++11 became the minimum supported compiler, so <code>SystemRequirements: C++11</code> was no longer necessary.</li> <li>In (the forthcoming) R 4.3, the <a href="https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2023/01/27#n2023-01-27" target="_blank" rel="noopener">default C++ standard is C++17</a> where available. <code>R CMD check</code> now <a href="https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2023/01/31" target="_blank" rel="noopener">raises a NOTE</a> if anything older than the default is specified in <code>SystemRequirements:</code> or <code>CXX_STD</code> in the various <code>src/Makevars*</code> files. This NOTE will block submission to CRAN &mdash; if the standard you specify is necessary for your package you will likely need to explain why.</li> </ul> <h3 id="how-to-fix-it">How to fix it <a href="#how-to-fix-it"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><ol> <li>Edit the DESCRIPTION file and remove <code>SystemRequirements: C++11</code>.</li> <li>Edit <code>src/Makevars</code>, <code>src/Makevars.win</code>, and <code>src/Makevars.ucrt</code> and remove <code>CXX_STD=CXX11</code>.</li> </ol> <p>After making these changes, the package should install without trouble on R 3.6 and above. It may not build on R &lt;= 3.5 on systems with very old compilers, though it is likely that the vast majority of users will have a newer version of R and/or have recent enough compilers. If you want to be confident that your package will be installable on R 3.5 and below with old compilers, there are several options; we offer two of the simplest approaches here:</p> <ul> <li>You can use a configure script at the top level of the package, and have it add <code>CXX_STD=CXX11</code> for R 3.5 and below. An example (unmerged) <a href="https://github.com/tidyverse/readxl/pull/722/files" target="_blank" rel="noopener">pull request to the readxl</a> package demonstrates this approach. You will also need to add <code>Biarch: true</code> in your DESCRIPTION file. This appears to be the approach preferred by CRAN.</li> <li>For users with R &lt;= 3.5 on a system with an older compiler, package authors can instruct users to edit their <code>~/.R/Makevars</code> file to include this line: <code>CXX_STD=CXX11</code>.</li> </ul> <p>The tidyverse has a <a href="https://www.tidyverse.org/blog/2019/04/r-version-support/" target="_blank" rel="noopener">policy of supporting four previous versions</a> of R. Currently that includes R 3.5, but with the upcoming release of R 4.3 (which should be this Spring some time) the minimum version we will support is R 3.6. As we won&rsquo;t be supporting R 3.5 in the near future, you should not feel pressured to either.</p> <h2 id="warning-regarding-the-use-of-codesprintfcode-in-cc">WARNING regarding the use of <code>sprintf()</code> in C/C++ <a href="#warning-regarding-the-use-of-codesprintfcode-in-cc"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another recent change in CRAN checks on R-devel that authors might encounter is the disallowing of the use of the C functions <code>sprintf()</code> and <code>vsprintf()</code>. <code>R CMD check</code> on R-devel may throw warnings that look something like this:</p> <pre><code>checking compiled code ... WARNING File 'fs/libs/fs.so': Found 'sprintf', possibly from 'sprintf' (C) Object: 'file.o' Compiled code should not call entry points which might terminate R nor write to stdout/stderr instead of to the console, nor use Fortran I/O nor system RNGs nor [v]sprintf. See 'Writing portable packages' in the 'Writing R Extensions' manual. </code></pre> <p>According to the <a href="https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2022/12/24#n2022-12-24" target="_blank" rel="noopener">NEWS for R-devel</a> (which will be R 4.3):</p> <blockquote> <p>The use of sprintf and vsprintf from C/C++ has been deprecated in macOS 13 and is a known security risk. <code>R CMD check</code> now reports (on all platforms) if their use is found in compiled code: replace by snprintf or vsnprintf respectively.</p> </blockquote> <p>These are considered to be a security risk because they potentially allow <a href="https://en.wikipedia.org/wiki/Buffer_overflow" target="_blank" rel="noopener">buffer overflows</a> that write more bytes than are available in the output buffer. This is a risk if the text that is being passed to <code>sprintf()</code> comes from an uncontrolled source.</p> <p>Here is a very simple example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://cpp11.r-lib.org'>cpp11</a></span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://cpp11.r-lib.org/reference/cpp_source.html'>cpp_function</a></span><span class='o'>(</span><span class='s'>'</span></span> <span><span class='s'> int say_height(int height) &#123;</span></span> <span><span class='s'> // "My height is xxx cm" is 19 characters but we need</span></span> <span><span class='s'> // to add one for the null-terminator</span></span> <span><span class='s'> char out[19 + 1];</span></span> <span><span class='s'> int n;</span></span> <span><span class='s'> n = sprintf(out, "My height is %i cm", height);</span></span> <span><span class='s'> Rprintf(out);</span></span> <span><span class='s'> return n;</span></span> <span><span class='s'> &#125;</span></span> <span><span class='s'>'</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nf'>say_height</span><span class='o'>(</span><span class='m'>182</span><span class='o'>)</span></span> <span><span class='c'>#&gt; My height is 182 cm</span></span> <span></span><span><span class='c'>#&gt; [1] 19</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>say_height</span><span class='o'>(</span><span class='m'>1824</span><span class='o'>)</span> <span class='c'># This will abort due to buffer overflow</span></span></code></pre> </div> <h3 id="how-to-fix-it-1">How to fix it <a href="#how-to-fix-it-1"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>In most cases, this should be a simple fix: replace <code>sprintf()</code> with <code>snprintf()</code> and <code>vsprintf()</code> with <code>vsnprintf()</code>. These <code>n</code> variants take a second parameter <code>size</code>, that specifies the maximum number of bytes to be written, <em>including the automatically appended null-terminator</em>. If the output is a static buffer, you can use <code>sizeof()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://cpp11.r-lib.org/reference/cpp_source.html'>cpp_function</a></span><span class='o'>(</span><span class='s'>'</span></span> <span><span class='s'> int say_height_safely(int height) &#123;</span></span> <span><span class='s'> // "My height is xxx cm\\n" is 20 characters but we need </span></span> <span><span class='s'> // to add one for the null-terminator</span></span> <span><span class='s'> char out[20 + 1];</span></span> <span><span class='s'> int n;</span></span> <span><span class='s'> n = snprintf(out, sizeof(out), "My height is %i cm\\n", height);</span></span> <span><span class='s'> Rprintf(out);</span></span> <span><span class='s'> return n;</span></span> <span><span class='s'> &#125;</span></span> <span><span class='s'>'</span><span class='o'>)</span></span> <span></span> <span><span class='nf'>say_height_safely</span><span class='o'>(</span><span class='m'>182</span><span class='o'>)</span></span> <span><span class='c'>#&gt; My height is 182 cm</span></span> <span></span><span><span class='c'>#&gt; [1] 20</span></span> <span></span><span><span class='nf'>say_height_safely</span><span class='o'>(</span><span class='m'>1824567</span><span class='o'>)</span></span> <span><span class='c'>#&gt; My height is 1824567</span></span> <span></span><span><span class='c'>#&gt; [1] 24</span></span> <span></span></code></pre> </div> <p>Notice that the return value of <code>sprintf()</code> and <code>snprintf()</code> are slightly different. <code>sprintf()</code> returns the total number of characters written (excluding the null-terminator), while <code>snprintf()</code> returns the length of the formatted string, whether or not it has been truncated to match <code>size</code>.</p> <p>It is a bit trickier if the destination is not a static buffer, so you&rsquo;ll have to determine the maximum <code>size</code> by carefully thinking about the code.</p> <h2 id="warning-regarding-the-use-of-strict-prototypes-in-c">WARNING regarding the use of strict prototypes in C <a href="#warning-regarding-the-use-of-strict-prototypes-in-c"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many maintainers with packages containing C code have also been getting hit with this warning:</p> <pre><code>warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] </code></pre> <p>This usually comes from C function declarations that look like this, with no arguments specified (which is very common):</p> <div class="highlight"><pre class="chroma"><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">myfun</span><span class="p">()</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div><p>This new warning is because CRAN is now running checks on R-devel with the <code>-Wstrict-prototypes</code> compiler flag set. In R we define functions that take no arguments with <code>myfun &lt;- function() {...}</code> all the time. In C, with this flag set, the fact that a function takes no arguments must be explicitly stated (i.e., the arguments list cannot be empty). In the upcoming C23 standard, empty function signatures will be considered valid and not ambiguous, however at this point it is likely to be the reason you encounter this warning from CRAN.</p> <h3 id="how-to-fix-it-2">How to fix it <a href="#how-to-fix-it-2"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>This can be fixed by placing the <code>void</code> keyword in the previously empty argument list:</p> <div class="highlight"><pre class="chroma"><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">myfun</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div><p>Here is an example where the authors of <a href="https://topepo.github.io/Cubist/" target="_blank" rel="noopener">Cubist</a> applied the <a href="https://github.com/topepo/Cubist/pull/46" target="_blank" rel="noopener">necessary patches</a>, and <a href="https://github.com/r-lib/rlang/pull/1508" target="_blank" rel="noopener">another one in rlang</a>.</p> <h3 id="vendored-code">Vendored code <a href="#vendored-code"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Function declarations without a prototype are very common, and unfortunately are thus likely to appear in libraries that you include in your package. This may require you to patch that code in your package. The <a href="https://readxl.tidyverse.org" target="_blank" rel="noopener">readxl</a> package includes the <a href="https://github.com/libxls/libxls" target="_blank" rel="noopener">libxls C library</a>, which was patched <a href="https://github.com/tidyverse/readxl/commit/afdc9b90cfc2bb1e1c5490c7ba3af5ecfc4a7876" target="_blank" rel="noopener">in readxl here</a> to deal with this issue.</p> <p>The ideal solution in cases like this would be to submit patches to the upstream libraries so you don&rsquo;t have to deal with the ongoing maintenance of your local patches, but that is not always possible. Generally, you can explain this problem when submitting your package, and as long as you&rsquo;ve have notified the upstream maintainer, CRAN should accept your updated package.</p> <h3 id="unspecified-types-in-function-signature">Unspecified types in function signature <a href="#unspecified-types-in-function-signature"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The <code>-Wstrict-prototypes</code> compiler flag will also catch deprecated function definitions where the types of the arguments are not declared. This is actually likely the primary purpose for CRAN enabling this flag, as it is ambiguous and much more dangerous than empty function signatures.</p> <p>These take the form:</p> <div class="highlight"><pre class="chroma"><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">myfun</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div><p>where the argument types are not declared. This is solved by declaring the types of the arguments:</p> <div class="highlight"><pre class="chroma"><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">myfun</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">y</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">};</span> </code></pre></div> dplyr 1.1.1 https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/ Wed, 22 Mar 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/ <p>We&rsquo;re stoked to announce the release of <a href="https://dplyr.tidyverse.org/" target="_blank" rel="noopener">dplyr 1.1.1</a>. We don&rsquo;t typically blog about patch releases, because they generally only fix bugs without significantly changing behavior, but this one includes two important updates:</p> <ul> <li>Addressing various performance regressions</li> <li>Refining the <code>multiple</code> match warning thrown by dplyr&rsquo;s joins</li> </ul> <p>You can see a full list of changes in the <a href="https://dplyr.tidyverse.org/news/index.html" target="_blank" rel="noopener">release notes</a>. To see the other blog posts in the dplyr 1.1.0 series, head <a href="https://www.tidyverse.org/tags/dplyr-1-1-0/" target="_blank" rel="noopener">here</a>.</p> <p>You can install dplyr 1.1.1 from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dplyr"</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="performance-regressions">Performance regressions <a href="#performance-regressions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In the <a href="https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs">1.1.0 post on vctrs</a>, we discussed that we&rsquo;ve rewritten all of dplyr&rsquo;s vector functions on top of <a href="https://vctrs.r-lib.org/" target="_blank" rel="noopener">vctrs</a> for improved versatility. Unfortunately, we accidentally made two sets of functions much slower, especially when used on a data frame with many groups:</p> <ul> <li> <p> <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> and <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a></p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>nth()</code></a>, <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>first()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>last()</code></a></p> </li> </ul> <p>These performance issues have been addressed, and should be back to 1.0.10 level of performance. <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> is still <em>slightly</em> slower than 1.0.10, but it isn&rsquo;t likely to be very noticeable, and we already have plans to improve this further in a future release.</p> <h2 id="revisiting-multiple-matches">Revisiting multiple matches <a href="#revisiting-multiple-matches"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In the <a href="https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins">1.1.0 post on joins</a>, we discussed the new <code>multiple</code> argument that was added to <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>left_join()</code></a> and friends, which had a built in safety check that warned when you performed a join where a row from <code>x</code> matched more than one row from <code>y</code>. The TLDR of the discussion below is that we&rsquo;ve realized that this warning was being thrown in too many cases, so we&rsquo;ve adjusted it in such a way that it now only catches the most dangerous type of join (a many-to-many join), meaning that you should see the warning <em>much</em> less often.</p> <p>As a reminder, <code>multiple</code> determines what happens when a row from <code>x</code> matches more than one row from <code>y</code>. You can choose to return <code>&quot;all&quot;</code> of the matches, the <code>&quot;first&quot;</code> or <code>&quot;last&quot;</code> match, or <code>&quot;any&quot;</code> of the matches if you are just interested in detecting if there is at least one. <code>multiple</code> defaulted to a behavior similar to <code>&quot;all&quot;</code>, with the added side effect of throwing a warning if multiple matches were actually detected, like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>student</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> student_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span>,</span> <span> transfer <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>FALSE</span>, <span class='kc'>TRUE</span>, <span class='kc'>TRUE</span><span class='o'>)</span>,</span> <span> initial_term <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"fall 2018"</span>, <span class='s'>"fall 2020"</span>, <span class='s'>"fall 2020"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>term</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> student_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span>, <span class='m'>3</span>, <span class='m'>3</span><span class='o'>)</span>,</span> <span> term <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"fall 2018"</span>, <span class='s'>"spring 2019"</span>, <span class='s'>"fall 2020"</span>, <span class='s'>"fall 2020"</span>, <span class='s'>"spring 2021"</span>, <span class='s'>"fall 2021"</span><span class='o'>)</span>,</span> <span> course_load <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>12</span>, <span class='m'>15</span>, <span class='m'>10</span>, <span class='m'>14</span>, <span class='m'>15</span>, <span class='m'>12</span><span class='o'>)</span></span> <span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Information about students attending a university.</span></span> <span><span class='c'># One row per (student_id).</span></span> <span><span class='nv'>student</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; student_id transfer initial_term</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;lgl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 FALSE fall 2018 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 TRUE fall 2020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 TRUE fall 2020</span></span> <span></span><span></span> <span><span class='c'># Term specific information about each student.</span></span> <span><span class='c'># One row per (student_id, term) combination.</span></span> <span><span class='nv'>term</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; student_id term course_load</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 fall 2018 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 spring 2019 15</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 fall 2020 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 fall 2020 14</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 spring 2021 15</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 fall 2021 12</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>student</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>term</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in left_join(student, term, join_by(student_id)): Each row in `x` is expected to match at most 1 row in `y`.</span></span> <span><span class='c'>#&gt; i Row 1 of `x` matches multiple rows.</span></span> <span><span class='c'>#&gt; i If multiple matches are expected, set `multiple = "all"` to silence this warning.</span></span> <span><span class='c'>#&gt; # A tibble: 6 × 5</span></span> <span><span class='c'>#&gt; student_id transfer initial_term term course_load</span></span> <span><span class='c'>#&gt; &lt;dbl&gt; &lt;lgl&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 1 FALSE fall 2018 fall 2018 12</span></span> <span><span class='c'>#&gt; 2 1 FALSE fall 2018 spring 2019 15</span></span> <span><span class='c'>#&gt; 3 2 TRUE fall 2020 fall 2020 10</span></span> <span><span class='c'>#&gt; 4 3 TRUE fall 2020 fall 2020 14</span></span> <span><span class='c'>#&gt; 5 3 TRUE fall 2020 spring 2021 15</span></span> <span><span class='c'>#&gt; 6 3 TRUE fall 2020 fall 2021 12</span></span></code></pre> </div> <p>To silence this warning, we encouraged you to set <code>multiple = &quot;all&quot;</code> to be explicit about the fact that you expected a row from <code>x</code> to match multiple rows in <code>y</code>.</p> <p>The original motivation for this behavior comes from a two-part hypothesis of ours:</p> <ul> <li> <p>Users are often surprised when a join returns more rows than the left-hand table started with (in the above example, <code>student</code> has 3 rows but the join result has 6).</p> </li> <li> <p>It is dangerous to allow joins that can result in a Cartesian explosion of the number of rows (i.e. <code>nrow(x) * nrow(y)</code>).</p> </li> </ul> <p>This hypothesis led us to automatically warn on two types of join relationships, one-to-many joins and many-to-many joins. If you aren&rsquo;t familiar with these terms, here is a quick rundown of the 4 types of join relationships (often discussed in a SQL context), which provide constraints on the number of allowed matches:</p> <ul> <li>one-to-one: <ul> <li>A row from <code>x</code> can match at most 1 row from <code>y</code>.</li> <li>A row from <code>y</code> can match at most 1 row from <code>x</code>.</li> </ul> </li> <li>one-to-many: <ul> <li>A row from <code>x</code> can match any number of rows in <code>y</code>.</li> <li>A row from <code>y</code> can match at most 1 row from <code>x</code>.</li> </ul> </li> <li>many-to-one: <ul> <li>A row from <code>x</code> can match at most 1 row from <code>y</code>.</li> <li>A row from <code>y</code> can match any number of rows in <code>x</code>.</li> </ul> </li> <li>many-to-many: <ul> <li>A row from <code>x</code> can match any number of rows in <code>y</code>.</li> <li>A row from <code>y</code> can match any number of rows in <code>x</code>.</li> </ul> </li> </ul> <p>After gathering some valuable <a href="https://github.com/tidyverse/dplyr/issues/6717" target="_blank" rel="noopener">user feedback</a> and conducting an <a href="https://github.com/tidyverse/dplyr/issues/6731" target="_blank" rel="noopener">in depth analysis</a> of these join relationships, we&rsquo;ve determined that the only relationship style actually worth warning on is many-to-many, because that is the one that can result in a Cartesian explosion of rows. In retrospect, the one-to-many relationship is actually quite common, and is symmetrical with many-to-one, which we weren&rsquo;t warning on. You could actually exploit this fact by switching the above join around, which would silence the warning:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>term</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>student</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; student_id term course_load transfer initial_term</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;lgl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 fall 2018 12 FALSE fall 2018 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 spring 2019 15 FALSE fall 2018 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 fall 2020 10 TRUE fall 2020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 fall 2020 14 TRUE fall 2020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 spring 2021 15 TRUE fall 2020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 fall 2021 12 TRUE fall 2020</span></span> <span></span></code></pre> </div> <p>We still believe that new users are often surprised when a join returns more rows than they originally started with, but the many-to-one case of this is rarely a problem in practice. So, as of dplyr 1.1.1, we no longer warn on one-to-many relationships, which should drastically reduce the amount of warnings that you see.</p> <h3 id="many-to-many-relationships">Many-to-many relationships <a href="#many-to-many-relationships"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>A many-to-many relationship is much harder to construct (which is good). In fact, a database system won&rsquo;t even let you create one of these &ldquo;relationships&rdquo; between two tables directly, instead requiring you to create a third bridge table that turns the many-to-many relationship into two one-to-many relationships. We can &ldquo;accidentally&rdquo; create one of these in R though:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>course</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> student_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>2</span>, <span class='m'>3</span>, <span class='m'>3</span>, <span class='m'>3</span>, <span class='m'>3</span><span class='o'>)</span>,</span> <span> instructor_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span>, <span class='m'>4</span><span class='o'>)</span>,</span> <span> course <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>101</span>, <span class='m'>110</span>, <span class='m'>123</span>, <span class='m'>110</span>, <span class='m'>101</span>, <span class='m'>110</span>, <span class='m'>115</span>, <span class='m'>110</span>, <span class='m'>101</span><span class='o'>)</span>,</span> <span> term <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span></span> <span> <span class='s'>"fall 2018"</span>, <span class='s'>"fall 2018"</span>, <span class='s'>"spring 2019"</span>, <span class='s'>"fall 2020"</span>, <span class='s'>"fall 2020"</span>, </span> <span> <span class='s'>"fall 2020"</span>, <span class='s'>"fall 2020"</span>, <span class='s'>"spring 2021"</span>, <span class='s'>"fall 2021"</span></span> <span> <span class='o'>)</span>,</span> <span> grade <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"C"</span>, <span class='s'>"A"</span>, <span class='s'>"C"</span>, <span class='s'>"D"</span>, <span class='s'>"B"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='c'># Information about the courses each student took per semester.</span></span> <span><span class='c'># One row per (student_id, course, term) combination.</span></span> <span><span class='nv'>course</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 9 × 5</span></span></span> <span><span class='c'>#&gt; student_id instructor_id course term grade</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1 101 fall 2018 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 2 110 fall 2018 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 3 123 spring 2019 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 1 110 fall 2020 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 2 101 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 1 110 fall 2020 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> 3 2 115 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>8</span> 3 3 110 spring 2021 D </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>9</span> 3 4 101 fall 2021 B</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Forgetting to join by both `student_id` and `term`!</span></span> <span><span class='nv'>term</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>course</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in left_join(term, course, by = join_by(student_id)): Detected an unexpected many-to-many relationship between `x` and `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `x` matches multiple rows in `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `y` matches multiple rows in `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> If a many-to-many relationship is expected, set `relationship =</span></span> <span><span class='c'>#&gt; "many-to-many"` to silence this warning.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 20 × 7</span></span></span> <span><span class='c'>#&gt; student_id term.x course_load instructor_id course term.y grade</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 1 fall 2018 12 1 101 fall 2018 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1 fall 2018 12 2 110 fall 2018 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1 fall 2018 12 3 123 spring 2019 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1 spring 2019 15 1 101 fall 2018 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 1 spring 2019 15 2 110 fall 2018 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 1 spring 2019 15 3 123 spring 2019 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 2 fall 2020 10 1 110 fall 2020 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 2 fall 2020 10 2 101 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 3 fall 2020 14 1 110 fall 2020 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 3 fall 2020 14 2 115 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 3 fall 2020 14 3 110 spring 2021 D </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 3 fall 2020 14 4 101 fall 2021 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> 3 spring 2021 15 1 110 fall 2020 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> 3 spring 2021 15 2 115 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> 3 spring 2021 15 3 110 spring 2021 D </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> 3 spring 2021 15 4 101 fall 2021 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> 3 fall 2021 12 1 110 fall 2020 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> 3 fall 2021 12 2 115 fall 2020 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>19</span> 3 fall 2021 12 3 110 spring 2021 D </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>20</span> 3 fall 2021 12 4 101 fall 2021 B</span></span> <span></span></code></pre> </div> <p>In the example above, we&rsquo;ve forgotten to include the <code>term</code> column when joining these two tables together, which accidentally results in a small explosion of rows (we end up with 20 rows, more than in either original input, but not quite the maximum possible amount, which is a whopping 54 rows!). Luckily, dplyr warns us that at least one row in each table matches more than one row in the opposite table - a sign that something isn&rsquo;t right. At this point we can do one of two things:</p> <ul> <li> <p>Look into the new <code>relationship</code> argument that the warning mentions (we&rsquo;ll discuss this below)</p> </li> <li> <p>Look at our join to see if we made a mistake</p> </li> </ul> <p>Of course, in this case we&rsquo;ve messed up, and adding <code>term</code> into the by expression results in the correct (and silent) join:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>term</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>course</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span>, <span class='nv'>term</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 9 × 6</span></span></span> <span><span class='c'>#&gt; student_id term course_load instructor_id course grade</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 fall 2018 12 1 101 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 fall 2018 12 2 110 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 spring 2019 15 3 123 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 fall 2020 10 1 110 B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 fall 2020 10 2 101 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 fall 2020 14 1 110 A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> 3 fall 2020 14 2 115 C </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>8</span> 3 spring 2021 15 3 110 D </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>9</span> 3 fall 2021 12 4 101 B</span></span> <span></span></code></pre> </div> <h3 id="join-relationships">Join <code>relationship</code>s <a href="#join-relationships"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>To adjust the joins to only warn on many-to-many relationships, we&rsquo;ve done two things:</p> <ul> <li> <p><code>multiple</code> now defaults to <code>&quot;all&quot;</code>, and is now focused solely on limiting the matches returned if multiple are detected, rather than also optionally warning/erroring.</p> </li> <li> <p>We&rsquo;ve added a new <code>relationship</code> argument.</p> </li> </ul> <p>The <code>relationship</code> argument allows you to explicitly specify the expected join relationship between the keys of <code>x</code> and <code>y</code> using the exact options we listed above: <code>&quot;one-to-one&quot;</code>, <code>&quot;one-to-many&quot;</code>, <code>&quot;many-to-one&quot;</code>, and <code>&quot;many-to-many&quot;</code>. If the constraints of the relationship you choose are violated, an error is thrown. For example, we could use this to require that the <code>student</code> + <code>term</code> join contains a one-to-many relationship between the two tables:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>student</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>term</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span><span class='o'>)</span>, relationship <span class='o'>=</span> <span class='s'>"one-to-many"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; student_id transfer initial_term term course_load</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;lgl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 FALSE fall 2018 fall 2018 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 FALSE fall 2018 spring 2019 15</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 TRUE fall 2020 fall 2020 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 TRUE fall 2020 fall 2020 14</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 TRUE fall 2020 spring 2021 15</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 TRUE fall 2020 fall 2021 12</span></span> <span></span></code></pre> </div> <p>Let&rsquo;s violate this by adding a duplicate row in <code>student</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>student_bad</span> <span class='o'>&lt;-</span> <span class='nv'>student</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>tibble</span><span class='nf'>::</span><span class='nf'><a href='https://tibble.tidyverse.org/reference/add_row.html'>add_row</a></span><span class='o'>(</span></span> <span> student_id <span class='o'>=</span> <span class='m'>1</span>, </span> <span> transfer <span class='o'>=</span> <span class='kc'>FALSE</span>, </span> <span> initial_term <span class='o'>=</span> <span class='s'>"fall 2019"</span>, </span> <span> .after <span class='o'>=</span> <span class='m'>1</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nv'>student_bad</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; student_id transfer initial_term</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;lgl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 FALSE fall 2018 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 FALSE fall 2019 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 TRUE fall 2020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 TRUE fall 2020</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>student_bad</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>term</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>student_id</span><span class='o'>)</span>, relationship <span class='o'>=</span> <span class='s'>"one-to-many"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `left_join()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Each row in `y` must match at most 1 row in `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `y` matches multiple rows in `x`.</span></span> <span></span></code></pre> </div> <p>The default value of <code>relationship</code> doesn&rsquo;t add any constraints, but for equality joins it will check to see if a many-to-many relationship exists, and will warn if one occurs (like with the <code>term</code> + <code>course</code> join from above). As mentioned before, this is quite hard to do, and often means you have a mistake in your join call or in the data itself. If you really do want to perform a join with this kind of relationship, to silence the warning you can explicitly specify <code>relationship = &quot;many-to-many&quot;</code>.</p> <p>One last thing to note is that <code>relationship</code> doesn&rsquo;t handle the case of an <em>unmatched</em> row. For that, you should use the <code>unmatched</code> argument that was also added in 1.1.0. The combination of <code>relationship</code> and <code>unmatched</code> provides a complete set of tools for adding production level quality control checks to your joins.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The examples used in this blog post were adapted from <a href="https://github.com/eipi10" target="_blank" rel="noopener">@eipi10</a> in <a href="https://github.com/tidyverse/dplyr/issues/6717" target="_blank" rel="noopener">this issue</a>.</p> <p>We&rsquo;d like to thank all 66 contributors who help in someway, whether it was filing issues or contributing code and documentation: <a href="https://github.com/alexhallam" target="_blank" rel="noopener">@alexhallam</a>, <a href="https://github.com/ammar-gla" target="_blank" rel="noopener">@ammar-gla</a>, <a href="https://github.com/arnaudgallou" target="_blank" rel="noopener">@arnaudgallou</a>, <a href="https://github.com/ArthurAndrews" target="_blank" rel="noopener">@ArthurAndrews</a>, <a href="https://github.com/AuburnEagle-578" target="_blank" rel="noopener">@AuburnEagle-578</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/Bisaloo" target="_blank" rel="noopener">@Bisaloo</a>, <a href="https://github.com/bitplane" target="_blank" rel="noopener">@bitplane</a>, <a href="https://github.com/chrarnold" target="_blank" rel="noopener">@chrarnold</a>, <a href="https://github.com/D5n9sMatrix" target="_blank" rel="noopener">@D5n9sMatrix</a>, <a href="https://github.com/daattali" target="_blank" rel="noopener">@daattali</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dieghernan" target="_blank" rel="noopener">@dieghernan</a>, <a href="https://github.com/dkutner" target="_blank" rel="noopener">@dkutner</a>, <a href="https://github.com/eipi10" target="_blank" rel="noopener">@eipi10</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/emilBeBri" target="_blank" rel="noopener">@emilBeBri</a>, <a href="https://github.com/fawda123" target="_blank" rel="noopener">@fawda123</a>, <a href="https://github.com/fedassembly" target="_blank" rel="noopener">@fedassembly</a>, <a href="https://github.com/fkohrt" target="_blank" rel="noopener">@fkohrt</a>, <a href="https://github.com/gavinsimpson" target="_blank" rel="noopener">@gavinsimpson</a>, <a href="https://github.com/geogale" target="_blank" rel="noopener">@geogale</a>, <a href="https://github.com/ggrothendieck" target="_blank" rel="noopener">@ggrothendieck</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hope-data-science" target="_blank" rel="noopener">@hope-data-science</a>, <a href="https://github.com/jaganmn" target="_blank" rel="noopener">@jaganmn</a>, <a href="https://github.com/jakub-jedrusiak" target="_blank" rel="noopener">@jakub-jedrusiak</a>, <a href="https://github.com/JorisChau" target="_blank" rel="noopener">@JorisChau</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/krprasangdas" target="_blank" rel="noopener">@krprasangdas</a>, <a href="https://github.com/larry77" target="_blank" rel="noopener">@larry77</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, <a href="https://github.com/LukasWallrich" target="_blank" rel="noopener">@LukasWallrich</a>, <a href="https://github.com/maellecoursonnais" target="_blank" rel="noopener">@maellecoursonnais</a>, <a href="https://github.com/manhnguyen48" target="_blank" rel="noopener">@manhnguyen48</a>, <a href="https://github.com/mattansb" target="_blank" rel="noopener">@mattansb</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/mhaynam" target="_blank" rel="noopener">@mhaynam</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/Moohan" target="_blank" rel="noopener">@Moohan</a>, <a href="https://github.com/msgoussi" target="_blank" rel="noopener">@msgoussi</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/osheen1" target="_blank" rel="noopener">@osheen1</a>, <a href="https://github.com/Pozdniakov" target="_blank" rel="noopener">@Pozdniakov</a>, <a href="https://github.com/psychelzh" target="_blank" rel="noopener">@psychelzh</a>, <a href="https://github.com/pur80a" target="_blank" rel="noopener">@pur80a</a>, <a href="https://github.com/robayo" target="_blank" rel="noopener">@robayo</a>, <a href="https://github.com/rszulkin" target="_blank" rel="noopener">@rszulkin</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/sda030" target="_blank" rel="noopener">@sda030</a>, <a href="https://github.com/sfirke" target="_blank" rel="noopener">@sfirke</a>, <a href="https://github.com/shannonpileggi" target="_blank" rel="noopener">@shannonpileggi</a>, <a href="https://github.com/stephLH" target="_blank" rel="noopener">@stephLH</a>, <a href="https://github.com/szabgab" target="_blank" rel="noopener">@szabgab</a>, <a href="https://github.com/tjebo" target="_blank" rel="noopener">@tjebo</a>, <a href="https://github.com/Torvaney" target="_blank" rel="noopener">@Torvaney</a>, <a href="https://github.com/twest820" target="_blank" rel="noopener">@twest820</a>, <a href="https://github.com/vanillajonathan" target="_blank" rel="noopener">@vanillajonathan</a>, <a href="https://github.com/warnes" target="_blank" rel="noopener">@warnes</a>, and <a href="https://github.com/zknitter" target="_blank" rel="noopener">@zknitter</a>.</p> webR 0.1.0 has been released https://www.tidyverse.org/blog/2023/03/webr-0-1-0/ Thu, 09 Mar 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/03/webr-0-1-0/ <!-- TODO: * [X] Look over / edit the post's title in the yaml * [X] Edit (or delete) the description; note this appears in the Twitter card * [X] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [X] Find photo & update yaml metadata * [X] Create `thumbnail-sq.jpg`; height and width should be equal * [X] Create `thumbnail-wd.jpg`; width should be >5x height * [X] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) * [x] Update temporary URLS to r-wasm.org * [x] Check for evalR API changes * [ ] Change r-wasm/jupyterlite-webr-kernel GitHub repo public * [x] Update r-wasm/webR repo for AWS and r-wasm.org URLs * [ ] Push npm update for r-wasm/webr package * [x] Update static service worker URLs * [x] Update webR shortcode URLs * [x] Update post date --> <!-- Initialise webR in the page --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.css"> <style> .CodeMirror pre { background-color: unset !important; } .btn-webr { background-color: #EEEEEE; border-bottom-left-radius: 0; border-bottom-right-radius: 0; } </style> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/codemirror.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/6.65.7/mode/r/r.js"></script> <script type="module"> import { WebR } from 'https://webr.r-wasm.org/v0.4.2/webr.mjs'; globalThis.webR = new WebR(); await globalThis.webR.init(); await webR.FS.mkdir('/persist'); await webR.FS.mount('IDBFS', {}, '/persist'); await webR.FS.syncfs(true); await webR.evalRVoid("webr::shim_install()"); await webR.evalRVoid("webr::global_prompt_install()", { withHandlers: false }); globalThis.webRCodeShelter = await new globalThis.webR.Shelter(); document.querySelectorAll(".btn-webr").forEach((btn) => { btn.innerText = 'Run code'; btn.disabled = false; }); </script> <!-- Add webr engine for knit --> <div class="highlight"> </div> <!-- Blog post main content --> <p>We&rsquo;re super excited to announce the release of webR v0.1.0! This is the first release of webR intended for general use by the web development and R communities and is the result of almost a year of hard work by the webR developers.</p> <p>This post will introduce webR, demonstrate some of the possibilities that running R in a web browser brings, and give a quick overview of how to include webR in your own TypeScript or JavaScript web applications.</p> <h2 id="introduction">Introduction <a href="#introduction"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>WebR is a version of the open-source R interpreter compiled for WebAssembly, along with a supporting TypeScript library for interacting with the console and R objects from a JavaScript environment.</p> <p>By compiling R to WebAssembly a user can visit a website and run R code directly within the web browser, without R installed on their device or a supporting computational R server. All that is required is a normal web server, including the type of cloud hosting service provided by Github Pages or Netlify.</p> <h2 id="how-it-works">How it works <a href="#how-it-works"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>WebR&rsquo;s core is based around compiling the open-source R interpreter for <a href="https://webassembly.org" target="_blank" rel="noopener">WebAssembly</a>, using the <a href="https://emscripten.org" target="_blank" rel="noopener">Emscripten</a> compiler suite along with <a href="https://flang.llvm.org/docs/" target="_blank" rel="noopener">LLVM Flang</a> to work with R&rsquo;s pre-existing C and Fortran based source code.</p> <p>WebAssembly (often abbreviated as Wasm) is a standard defining a virtual stack machine along with a corresponding <em>bytecode</em>. Efficient Wasm engines have already been implemented in most modern web browsers, which allows for the deployment of high performance Wasm applications on the web.</p> <p>While it&rsquo;s certainly possible for an interested programmer to write Wasm bytecode by hand, it is not a requirement to do so. Similar to how code and data is compiled into <em>machine code</em> for a certain computer processor, code and data can be compiled into the Wasm bytecode by compiler software that supports the Wasm standard.</p> <p>However, unlike with traditional machine code, the Wasm virtual machine (VM) is consistent across multiple different types of environment, architecture, and device &ndash; in theory the same bytecode binary can run anywhere without having to be recompiled for that environment. In this way the Wasm VM is similar to Java&rsquo;s JVM. However, in comparison to the JVM, Wasm has been designed and built from the ground up for use on the modern web, requiring strict sandboxing and security controls.</p> <p>Future use for WebAssembly has also been identified in server-side web development, containerisation, cloud computing, and more. With these applications, Wasm has been suggested as a universal binary format of the future. Multiple implementations of the Wasm VM already exist designed to run <em>outside</em> a web browser, through proposed Wasm standards such as <a href="https://wasi.dev" target="_blank" rel="noopener">WASI</a>.</p> <h2 id="whats-possible">What&rsquo;s possible? <a href="#whats-possible"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Undoubtedly, webR opens a world of possibilities for the interactive use of R and data science on the web.</p> <h3 id="an-online-r-console">An online R console <a href="#an-online-r-console"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>A web-based interactive R console is included in the webR source repository as a demonstration of integrating webR into a wider web application. A publicly accessible instance of the webR console can be found at <a href="https://webr.r-wasm.org/latest/">https://webr.r-wasm.org/latest/</a>.</p> <p><a href="webr-repl.png" target="_blank"><img src="webr-repl.png" alt="A screenshot showing the demo webR console creating a plot"></a></p> <p>With the webR online console a new user can get up and running with R in seconds. The webR console is also functional on many modern mobile devices, where traditional versions of R are not always available for installation at all<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>.</p> <p>It&rsquo;s possible to perform data analysis on reasonably large datasets by uploading data to a Virtual File System (VFS). The webR console provides an interface to view and interact with the VFS (<strong>Files</strong> tab, top right). Once a data file has been uploaded to the VFS it can be read by R like any standard file.</p> <p><a href="webr-repl2.png" target="_blank"><img src="webr-repl2.png" alt="A screenshot showing the demo webR console loading a data file"></a></p> <p>Note that uploading and downloading files to the VFS in this way does not actually involve transferring any data over the network. However, webR has been built so that it is possible to load data into webR over the network by using R&rsquo;s built in functions that can download from URL, such as <a href="https://rdrr.io/r/utils/read.table.html" target="_blank" rel="noopener"><code>read.csv()</code></a><sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>.</p> <p><a href="webr-repl3.png" target="_blank"><img src="webr-repl3.png" alt="A screenshot showing the demo webR console loading a data file from URL"></a></p> <p>Plotting is also supported (<strong>Plotting</strong> tab, top right), meaning a user can produce beautiful plot output with the webR console, closing the loop of reading data, performing analysis, and producing output. It is entirely feasible that a casual user could perform the basics of data science entirely within their web browser using webR.</p> <h3 id="an-educational-tool">An educational tool <a href="#an-educational-tool"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Consider the following code block containing some simple R code. After a short loading period while the webR binary and supporting files are downloaded, a <strong>Run code</strong> button is enabled on the code block, with the code itself able to be edited and remixed on the fly. Feel free to try this out now!</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-1">Loading webR...</button> <div id="webr-editor-1"></div> <div id="webr-code-output-1"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-1'); const outputDiv = document.getElementById('webr-code-output-1'); const editorDiv = document.getElementById('webr-editor-1'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `fit <- lm(mpg ~ am, data=mtcars)\nsummary(fit)`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>After executing the R code once, try changing the <code>am</code> variable in the model to <code>gear</code> and then clicking <strong>Run code</strong> again. You should immediately see how changing the model affects the components of the resulting fit. There is a real R session running and powering this code block &ndash; try replacing the entire code with something new!</p> <p>The following interactive code block produces an R plot that is directly embedded into the page. As with the previous example, the plot can be recreated or remixed multiple times by the reader simply by clicking the <strong>Run Code</strong> button.</p> <div class="highlight"> <button class="btn btn-default btn-webr" disabled type="button" id="webr-run-button-2">Loading webR...</button> <div id="webr-editor-2"></div> <div id="webr-code-output-2"><pre style="visibility: hidden"></pre></div> <script type="module"> const runButton = document.getElementById('webr-run-button-2'); const outputDiv = document.getElementById('webr-code-output-2'); const editorDiv = document.getElementById('webr-editor-2'); const editor = CodeMirror((elt) => { elt.style.border = '1px solid #eee'; elt.style.height = 'auto'; editorDiv.append(elt); },{ value: `data <- rnorm(1000, 10, 1)\nhist(data, c = rainbow(12))`, lineNumbers: true, mode: 'r', theme: 'light default', viewportMargin: Infinity, }); runButton.onclick = async () => { runButton.disabled = true; let canvas = undefined; await webR.init(); await webR.evalRVoid('webr::canvas(width=504, height=311.472)'); await webR.FS.syncfs(false); const result = await webRCodeShelter.captureR(editor.getValue(), { withAutoprint: true, captureStreams: true, captureConditions: false, captureGraphics: false, env: {}, }); try { await webR.evalRVoid("dev.off()"); const out = result.output.filter( evt => evt.type == 'stdout' || evt.type == 'stderr' ).map((evt) => evt.data).join('\n'); outputDiv.innerHTML = ''; const pre = document.createElement("pre"); if (/\S/.test(out)) { const code = document.createElement("code"); code.innerText = out; pre.appendChild(code); } else { pre.style.visibility = 'hidden'; } outputDiv.appendChild(pre); const msgs = await webR.flush(); msgs.forEach(msg => { if (msg.type === 'canvas'){ if (msg.data.event === 'canvasImage') { canvas.getContext('2d').drawImage(msg.data.image, 0, 0); } else if (msg.data.event === 'canvasNewPage') { canvas = document.createElement('canvas'); canvas.setAttribute('width', 2 * 504); canvas.setAttribute('height', 2 * 311.472); canvas.style.width="700px"; canvas.style.display="block"; canvas.style.margin="auto"; const p = document.createElement("p"); p.appendChild(canvas); outputDiv.appendChild(p); } } }); } finally { webRCodeShelter.purge(); runButton.disabled = false; } } </script> </div> <p>In my experience this way of interacting and experimenting with R code without the mental overhead of context switching from a web browser to an R console, or copying and pasting lines of example code, feels extremely fresh and exciting. An exciting potential application for webR is providing high-quality educational web content in exactly this kind of format.</p> <h3 id="reproducible-reports">Reproducible reports <a href="#reproducible-reports"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>A core principle of good science is that results should be repeatable and reproducible by others. Unfortunately the misuse of data analysis, leading to unreliable results, <a href="https://en.wikipedia.org/wiki/Misuse_of_statistics" target="_blank" rel="noopener">is a known issue</a>.</p> <p>The idea of a reproducible report is to bring the philosophy of repeatability to the delivery format itself. Reproducible reports weave together explanatory prose, data science, source code, output and figures; all in a single place with a consistent execution environment. With this, a user reading the report has everything they need to reproduce and confirm results for themselves.</p> <p>While Jupyter notebooks were not the first implementation of executable documents<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>, their popularity has grown over the last decade or so as a way to support high quality reproducible reports. Jupyter has been named <a href="https://www.nature.com/articles/d41586-018-07196-1" target="_blank" rel="noopener"><em>&ldquo;The data scientists&rsquo; computational notebook of choice&rdquo;</em></a> and almost 10 million Jupyter notebooks were publicly accessible on GitHub as of Oct 2020<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>.</p> <p>While a Jupyter notebook usually requires a Python and Jupyter installation to fully reproduce results, recent work by the <a href="https://jupyterlite.readthedocs.io/en/latest/" target="_blank" rel="noopener">JupyterLite</a> team uses Wasm to bring Jupyter to the web browser. JupyterLite can be used with <a href="https://pyodide.org/en/stable/" target="_blank" rel="noopener">Pyodide</a> to run Python based notebooks directly in the browser.</p> <p>WebR aims to provide that same experience for Jupyter notebooks based on R. As part of the initial release of webR, we are also releasing a <a href="https://github.com/r-wasm/jupyterlite-webr-kernel" target="_blank" rel="noopener">webR kernel for JupyterLite</a>, allowing users to write and execute reproducible Jupyter notebooks for R directly in the web browser.</p> <p>A JupyterLite instance with the webR kernel available can be found at <a href="https://jupyter.r-wasm.org/">https://jupyter.r-wasm.org/</a>, along with a sample R Jupyter notebook demonstrating a reproducible report.</p> <p><a href="jupyter.png" target="_blank"><img src="jupyter.png" alt="A screenshot showing the webR JupyterLite kernel"></a></p> <p>The JupyterLite kernel for R is still in the early stages of development and <a href="https://github.com/r-wasm/jupyterlite-webr-kernel#limitations" target="_blank" rel="noopener">includes some limitations</a>, but the core infrastructure is in place with the release of webR.</p> <h3 id="r-packages">R packages <a href="#r-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>R has a rich history of user-created extensions through the use of R packages. Most packages are a combination of R and C or C++ code, and so many packages must be compiled from source for the system they are running on. Unfortunately, it is not possible to install packages in this way in webR. Such an installation process would require an entire C/C++ to WebAssembly compiler toolchain running in the web page!</p> <p>For the moment, downloading pre-compiled Wasm binaries is the only supported way to install packages in webR. A pre-installed <code>webr</code> support package provides a helper function <code>webr::install()</code> which can be used to install packages from a CRAN-like repository. As part of the webR release we have provided a small repository of binary R packages compiled for Wasm, publicly hosted with URL <code>https://repo.r-wasm.org/</code>.</p> <h2 id="using-webr-in-your-own-projects">Using webR in your own projects <a href="#using-webr-in-your-own-projects"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>WebR aims to be as quick and easy to use as possible for those familiar with JavaScript web development. While a short introduction to using webR follows in this blog post, we think the best way to get up and running is by reading the Getting Started section of the <a href="https://docs.r-wasm.org/webr/latest/" target="_blank" rel="noopener">webR documentation</a>. The documentation goes into further detail about how to download webR, technical requirements for serving web pages that use webR, and provides more detailed examples.</p> <h3 id="downloading-and-using-webr-from-npm">Downloading and using webR from npm <a href="#downloading-and-using-webr-from-npm"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>For a project with dependencies managed by npm, the <a href="https://www.npmjs.com/package/@r-wasm/webr" target="_blank" rel="noopener">webR JavaScript package</a> can be installed by using the command,</p> <div class="highlight"><pre class="chroma"><code class="language-bash" data-lang="bash">npm i @r-wasm/webr </code></pre></div><p>Once available, webR can be imported into a project and a new instance of webR initialised with,</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kr">import</span> <span class="p">{</span> <span class="nx">WebR</span> <span class="p">}</span> <span class="nx">from</span> <span class="s1">&#39;@r-wasm/webr&#39;</span><span class="p">;</span> <span class="kr">const</span> <span class="nx">webR</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">WebR</span><span class="p">();</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">init</span><span class="p">();</span> </code></pre></div><p>Once a new instance of the <code>WebR()</code> class has been created, webR will begin to download WebAssembly binaries from the public CDN, and R will be started.</p> <h3 id="downloading-webr-release-packages">Downloading webR release packages <a href="#downloading-webr-release-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Full release packages for webR can also be downloaded from the webR <a href="https://github.com/r-wasm/webR/releases" target="_blank" rel="noopener">GitHub Releases</a> page. The full release packages include the webR JavaScript loader, along with WebAssembly binaries for R and its supporting libraries.</p> <p>Hosting a full release package on a web server makes it possible to use webR entirely on your own infrastructure, rather than relying on downloading Wasm binaries from the public CDN.</p> <h3 id="an-example-of-executing-r-code">An example of executing R code <a href="#an-example-of-executing-r-code"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Once R is ready, the JavaScript promise returned by <code>webR.init()</code> will resolve. At this point R code can be evaluated and results converted into JavaScript objects,</p> <div class="highlight"><pre class="chroma"><code class="language-javascript" data-lang="javascript"><span class="kd">let</span> <span class="nx">result</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">webR</span><span class="p">.</span><span class="nx">evalR</span><span class="p">(</span><span class="s1">&#39;rnorm(10,5,1)&#39;</span><span class="p">);</span> <span class="kd">let</span> <span class="nx">output</span> <span class="o">=</span> <span class="nx">await</span> <span class="nx">result</span><span class="p">.</span><span class="nx">toArray</span><span class="p">();</span> <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">output</span><span class="p">);</span> </code></pre></div><p>In the above example the <code>result</code> object can be thought of as a reference to a specific R object, and is converted into a standard JavaScript array using the <code>toArray()</code> function.</p> <p>Further examples and details of how to interact with the R console and work with R objects can be found in the <a href="https://docs.r-wasm.org/webr/latest/examples.html" target="_blank" rel="noopener">webR documentation</a>.</p> <h2 id="the-future-of-webr">The future of webR <a href="#the-future-of-webr"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Going forward we plan to expand and improve webR, including compiling more R packages for the webR public package repository. It is our hope that we can provide the same web-based computational infrastructure for R that <a href="https://pyodide.org/en/stable/" target="_blank" rel="noopener">Pyodide</a> has provided for the Python ecosystem.</p> <p>While WebAssembly engines are in theory able to provide near-native performance, when it comes to the requirements for advanced data science or the deployment of sophisticated machine learning models, the benefits of running tools such as the RStudio IDE natively or a high-performance cloud deployment will likely always outperform the relatively restricted WebAssembly virtual machine. Despite this, webR can provide a smooth, interactive and immediate introduction to the world of working with data in R. Users who have not had the chance to use R in the past due to the barriers raised by the installation of new software to their workstation, or registration for a cloud-based service, might yet still be convinced to introduce R to their workflow though an introduction with interactive examples or short reports powered by webR.</p> <p>The opportunity for enhancing educational content also continues beyond introductory materials. Many R packages are documented online, using automated tools such as <a href="https://pkgdown.r-lib.org" target="_blank" rel="noopener">pkgdown</a> to produce a dedicated website for the package. Alongside an introductory description, package websites usually also include usage details in the form of example code, reference documentation, and vignette articles. However, if a potential user would like to try the package for themselves, often the only way is by installing the package onto their own machine. Immediately interactive examples, powered by webR, are an interesting future possibility that would reduce this kind of barrier to entry.</p> <p>Fairly recently, the Shiny team announced <a href="https://shiny.rstudio.com/py/" target="_blank" rel="noopener">Shiny for Python</a>, a feature rich reactive web application framework targeting Python. Of particular note, the team used WebAssembly and Pyodide as a way to run a <a href="https://shiny.rstudio.com/py/docs/shinylive.html" target="_blank" rel="noopener">Shinylive</a> server directly in the user&rsquo;s web browser. One of the most exciting possible applications for webR is a similar architecture targeting the traditional R version of Shiny. Is it possible for a <em>Shinylive for R</em> to be powered by webR? We certainly hope so.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A massive thank you to all early webR users for their willingness to experiment and their feedback in the form of GitHub issues and pull requests,</p> <p> <a href="https://github.com/Anurodhyadav" target="_blank" rel="noopener">@Anurodhyadav</a>, <a href="https://github.com/barryrowlingson" target="_blank" rel="noopener">@barryrowlingson</a>, <a href="https://github.com/christianp" target="_blank" rel="noopener">@christianp</a>, <a href="https://github.com/ekianjo" target="_blank" rel="noopener">@ekianjo</a>, <a href="https://github.com/georgestagg" target="_blank" rel="noopener">@georgestagg</a>, <a href="https://github.com/HTUser-1" target="_blank" rel="noopener">@HTUser-1</a>, <a href="https://github.com/jason-variadiclabs" target="_blank" rel="noopener">@jason-variadiclabs</a>, <a href="https://github.com/jjesusfilho" target="_blank" rel="noopener">@jjesusfilho</a>, <a href="https://github.com/kdpsingh" target="_blank" rel="noopener">@kdpsingh</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/psychemedia" target="_blank" rel="noopener">@psychemedia</a>, <a href="https://github.com/Sjesc" target="_blank" rel="noopener">@Sjesc</a>, <a href="https://github.com/SugarRayLua" target="_blank" rel="noopener">@SugarRayLua</a> <a href="https://github.com/unclecode" target="_blank" rel="noopener">@unclecode</a>, and <a href="https://github.com/wch" target="_blank" rel="noopener">@wch</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>I am aware of at least one early adopter using webR as a way to access R on their Apple iPad. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Note that there are some security measures in place when fetching data that are applied to all web applications. Downloading datasets from URL requires that the web server providing the data supports and allows <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS" target="_blank" rel="noopener">Cross Origin Resource Sharing (CORS)</a> <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>Knuth originally introduced the precursor <a href="https://en.wikipedia.org/wiki/Literate_programming" target="_blank" rel="noopener">Literate Programming</a> paradigm in 1984, and more recently tools such as Sweave, knitr and RMarkdown enable embedding R and computational results directly into a report. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>Admittedly, only a small proportion using an R kernel. <a href="https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded-10-000-000-jupyter-notebooks-from-github-this-is-what-we-learned/" target="_blank" rel="noopener">The overwhelming majority use Python, R comes second, and Julia third.</a> <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> tidyverse 2.0.0 https://www.tidyverse.org/blog/2023/03/tidyverse-2-0-0/ Wed, 08 Mar 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/03/tidyverse-2-0-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the release of <a href="http://tidyverse.tidyverse.org/" target="_blank" rel="noopener">tidyverse</a> 2.0.0. The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is a &ldquo;meta&rdquo; package designed to make it easy to install and load core packages from the tidyverse in a single command. This is great for teaching and interactive use, but for package-development purposes we recommend that authors import only the specific packages that they use. For a complete list of changes, please see the release notes.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidyverse"</span><span class='o'>)</span></span></code></pre> </div> <p>There&rsquo;s only really one big change in tidyverse 2.0.0: lubridate is now a core member of the tidyverse! This means it&rsquo;s attached automatically when you load the tidyverse:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching core tidyverse packages</span> ──────────────────────── tidyverse 2.0.0 ──</span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.1.0.<span style='color: #BB0000;'>9000</span> <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>readr </span> 2.1.4 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>forcats </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>stringr </span> 1.5.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.4.1 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.1.8 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>lubridate</span> 1.9.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.3.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 1.0.1 </span></span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ────────────────────────────────────────── tidyverse_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use the <a href='http://conflicted.r-lib.org/'>conflicted package</a> to force all conflicts to become errors</span></span> <span></span></code></pre> </div> <p>You&rsquo;ll notice one other small change to the tidyverse message: we now advertise the <a href="https://conflicted.r-lib.org" target="_blank" rel="noopener">conflicted package</a>. This package has been around for a while, but we wanted to promote it a bit more heavily because it&rsquo;s so useful.</p> <p>conflicted provides an alternative conflict resolution strategy, when multiple packages export a function of the same name. R&rsquo;s default conflict resolution system gives precedence to the most recently loaded package. This can make it hard to detect conflicts, particularly when they&rsquo;re introduced by an update to an existing package. conflicted takes a different approach, turning conflicts into errors and forcing you to choose which function to use.</p> <p>To use conflicted, all you need to do is load it:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://conflicted.r-lib.org/'>conflicted</a></span><span class='o'>)</span></span></code></pre> </div> <p>Using any function that&rsquo;s defined in multiple packages will now throw an error:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nv'>cyl</span> <span class='o'>==</span> <span class='m'>8</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> <span style='color: #555555;'>[conflicted]</span> <span style='font-weight: bold;'>filter</span> found in 2 packages.</span></span> <span><span class='c'>#&gt; Either pick the one you want with `::`:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> <span style='color: #0000BB;'>dplyr</span>::filter</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> <span style='color: #0000BB;'>stats</span>::filter</span></span> <span><span class='c'>#&gt; Or declare a preference with `conflicts_prefer()`:</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `conflicts_prefer(dplyr::filter)`</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> `conflicts_prefer(stats::filter)`</span></span> <span></span></code></pre> </div> <p>As the error suggests, to resolve the problem you can either namespace individual calls:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nv'>am</span> <span class='o'>&amp;</span> <span class='nv'>cyl</span> <span class='o'>==</span> <span class='m'>8</span><span class='o'>)</span></span> <span><span class='c'>#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span><span class='c'>#&gt; Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4</span></span> <span><span class='c'>#&gt; Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8</span></span> <span></span></code></pre> </div> <p>Or declare a session wide preference:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://conflicted.r-lib.org/reference/conflicts_prefer.html'>conflicts_prefer</a></span><span class='o'>(</span><span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[conflicted]</span> Will prefer <span style='color: #0000BB; font-weight: bold;'>dplyr</span>::filter over any other package.</span></span> <span></span><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, <span class='nv'>am</span> <span class='o'>&amp;</span> <span class='nv'>cyl</span> <span class='o'>==</span> <span class='m'>8</span><span class='o'>)</span></span> <span><span class='c'>#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span></span> <span><span class='c'>#&gt; Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4</span></span> <span><span class='c'>#&gt; Maserati Bora 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8</span></span> <span></span></code></pre> </div> <p>The conflicted package is fairly established, but it hasn&rsquo;t seen a huge amount of use, so if you think of something that would make it better, <a href="https://github.com/r-lib/conflicted/issues" target="_blank" rel="noopener">please let us know!</a>.</p> dtplyr 1.3.0 https://www.tidyverse.org/blog/2023/02/dtplyr-1-3-0/ Fri, 24 Feb 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/02/dtplyr-1-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce the release of <a href="https://dtplyr.tidyverse.org" target="_blank" rel="noopener">dtplyr</a> 1.3.0. dtplyr gives you the speed of <a href="http://r-datatable.com/" target="_blank" rel="noopener">data.table</a> with the syntax of dplyr; you write dplyr (and tidyr) code and dtplyr translates it to the data.table equivalent.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dtplyr"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will give you an overview of the changes in this version: dtplyr no longer adds translations directly to data.tables, it includes some dplyr 1.1.0 updates, and we have made some performance improvements. As always, you can see a full list of changes in the <a href="https://github.com/tidyverse/dtplyr/releases/tag/v1.3.0" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dtplyr.tidyverse.org'>dtplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span></code></pre> </div> <h2 id="breaking-changes">Breaking changes <a href="#breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In previous versions, dtplyr registered translations that kicked in whenever you used a data.table. This <a href="https://github.com/tidyverse/dtplyr/issues/312" target="_blank" rel="noopener">caused problems</a> because merely loading dtplyr could cause otherwise ok code to fail because dplyr and tidyr functions would now return <code>lazy_dt</code> objects instead of <code>data.table</code> objects. To avoid this problem, we have removed those S3 methods so you must now explicitly opt-in to dtplyr translations by using <a href="https://dtplyr.tidyverse.org/reference/lazy_dt.html" target="_blank" rel="noopener"><code>lazy_dt()</code></a>.</p> <h2 id="dplyr-110">dplyr 1.1.0 <a href="#dplyr-110"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This release brings support for dplyr 1.1.0&rsquo;s <a href="https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-operation-grouping/" target="_blank" rel="noopener">per-operation grouping</a> and <a href="https://dplyr.tidyverse.org/reference/pick.html" target="_blank" rel="noopener"><code>pick()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>dt</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dtplyr.tidyverse.org/reference/lazy_dt.html'>lazy_dt</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>10</span>, id <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>dt</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>mean <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; `_DT1`[, .(mean = mean(x)), keyby = .(id)]</span></span> <span></span><span></span> <span><span class='nv'>dt</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dtplyr.tidyverse.org/reference/lazy_dt.html'>lazy_dt</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>10</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>10</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>dt</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>row_sum <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/colSums.html'>rowSums</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/pick.html'>pick</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; copy(`_DT2`)[, `:=`(row_sum = rowSums(data.table(x = x)))]</span></span> <span></span></code></pre> </div> <p>Per-operation grouping was one of the dplyr 1.1.0 features inspired by data.table, so it&rsquo;s neat to see it come full circle in this dtplyr release. Future releases will add support for other dplyr 1.1.0 features like the new <a href="https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#join_by" target="_blank" rel="noopener"><code>join_by()</code></a> syntax and <a href="https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/#reframe" target="_blank" rel="noopener"><code>reframe()</code></a>.</p> <h2 id="improved-translations">Improved translations <a href="#improved-translations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>dtplyr gains new translations for <a href="https://dplyr.tidyverse.org/reference/count.html" target="_blank" rel="noopener"><code>add_count()</code></a> and <code>unite()</code>, and the ranking functions, <a href="https://dplyr.tidyverse.org/reference/row_number.html" target="_blank" rel="noopener"><code>min_rank()</code></a>, <a href="https://dplyr.tidyverse.org/reference/row_number.html" target="_blank" rel="noopener"><code>dense_rank()</code></a>, <a href="https://dplyr.tidyverse.org/reference/percent_rank.html" target="_blank" rel="noopener"><code>percent_rank()</code></a>, &amp; <a href="https://dplyr.tidyverse.org/reference/percent_rank.html" target="_blank" rel="noopener"><code>cume_dist()</code></a> are now mapped to their <code>data.table</code> equivalents:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>dt</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/count.html'>add_count</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; copy(`_DT2`)[, `:=`(n = .N)]</span></span> <span></span><span></span> <span><span class='nv'>dt</span> <span class='o'>|&gt;</span> <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/unite.html'>unite</a></span><span class='o'>(</span><span class='s'>"z"</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; copy(`_DT2`)[, `:=`(z = paste(x, y, sep = "_"))][, `:=`(c("x", </span></span> <span><span class='c'>#&gt; "y"), NULL)]</span></span> <span></span><span></span> <span><span class='nv'>dt</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>r <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>min_rank</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "min", na.last = "keep"))]</span></span> <span></span><span></span> <span><span class='nv'>dt</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>r <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>dense_rank</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; copy(`_DT2`)[, `:=`(r = frank(x, ties.method = "dense", na.last = "keep"))]</span></span> <span></span></code></pre> </div> <p>This release also includes three translation improvements that yield better performance. When data has previously been copied <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> will use <code>setorder()</code> instead of <a href="https://rdrr.io/r/base/order.html" target="_blank" rel="noopener"><code>order()</code></a> and <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a> will drop unwanted columns by reference (i.e. with <code>var := NULL</code>). And <a href="https://dplyr.tidyverse.org/reference/slice.html" target="_blank" rel="noopener"><code>slice()</code></a> now uses an intermediate variable to reduce computation time of row selection.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A massive thanks to <a href="https://github.com/markfairbanks" target="_blank" rel="noopener">Mark Fairbanks</a> who did most of the work for this release, ably aided by the other dtplyr maintainers <a href="https://github.com/eutwt" target="_blank" rel="noopener">@eutwt</a> and <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a>. And thanks to everyone else who helped make this release possible, whether it was with code, documentation, or insightful comments: <a href="https://github.com/abalter" target="_blank" rel="noopener">@abalter</a>, <a href="https://github.com/akaviaLab" target="_blank" rel="noopener">@akaviaLab</a>, <a href="https://github.com/camnesia" target="_blank" rel="noopener">@camnesia</a>, <a href="https://github.com/caparks2" target="_blank" rel="noopener">@caparks2</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/eipi10" target="_blank" rel="noopener">@eipi10</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/johnF-moore" target="_blank" rel="noopener">@johnF-moore</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, and <a href="https://github.com/NicChr" target="_blank" rel="noopener">@NicChr</a>.</p> dplyr 1.1.0: `pick()`, `reframe()`, and `arrange()` https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/ Tue, 07 Feb 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/ <p>In this final <a href="https://dplyr.tidyverse.org/news/index.html#dplyr-110" target="_blank" rel="noopener">dplyr 1.1.0</a> post, we&rsquo;ll take a look at two new verbs, <a href="https://dplyr.tidyverse.org/reference/pick.html" target="_blank" rel="noopener"><code>pick()</code></a> and <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>, along with some changes to <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> that improve both reproducibility and performance. If you missed our previous posts, you should definitely go back and <a href="https://www.tidyverse.org/tags/dplyr-1-1-0/" target="_blank" rel="noopener">check them out</a>!</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dplyr"</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>12345</span><span class='o'>)</span></span></code></pre> </div> <h2 id="pick"><code>pick()</code> <a href="#pick"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One thing we noticed after dplyr 1.0.0 was released is that many people like to use <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a> for its column selection features while working inside a data-masking function like <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> or <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>. This is typically useful if you have a function that takes data frames as inputs, or if you need to compute features about a specific subset of columns.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> x_1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>3</span>, <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>2</span><span class='o'>)</span>, </span> <span> x_2 <span class='o'>=</span> <span class='m'>6</span><span class='o'>:</span><span class='m'>10</span>, </span> <span> w_4 <span class='o'>=</span> <span class='m'>11</span><span class='o'>:</span><span class='m'>15</span>, </span> <span> y_2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='m'>2</span>, <span class='m'>4</span>, <span class='m'>0</span>, <span class='m'>6</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span></span> <span> n_x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>ncol</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"x"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> n_y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>ncol</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"y"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span></span> <span><span class='c'>#&gt; n_x n_y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 1</span></span> <span></span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a> is intended to apply a function to each of these columns, rather than just select them, which is why its name doesn&rsquo;t feel natural for this operation. In dplyr 1.1.0 we&rsquo;ve introduced <a href="https://dplyr.tidyverse.org/reference/pick.html" target="_blank" rel="noopener"><code>pick()</code></a>, a specialized column selection variant with a more natural name:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span></span> <span> n_x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>ncol</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/pick.html'>pick</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"x"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> n_y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>ncol</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/pick.html'>pick</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/starts_with.html'>starts_with</a></span><span class='o'>(</span><span class='s'>"y"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span></span> <span><span class='c'>#&gt; n_x n_y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 1</span></span> <span></span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/pick.html" target="_blank" rel="noopener"><code>pick()</code></a> is particularly useful in combination with ranking functions like <a href="https://dplyr.tidyverse.org/reference/row_number.html" target="_blank" rel="noopener"><code>dense_rank()</code></a>, which have been upgraded in 1.1.0 to take data frames as inputs, serving as a way to jointly rank by multiple columns at once.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> rank1 <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>dense_rank</a></span><span class='o'>(</span><span class='nv'>x_1</span><span class='o'>)</span>, </span> <span> rank2 <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/row_number.html'>dense_rank</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/pick.html'>pick</a></span><span class='o'>(</span><span class='nv'>x_1</span>, <span class='nv'>y_2</span><span class='o'>)</span><span class='o'>)</span> <span class='c'># Using `y_2` to break ties in `x_1`</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 6</span></span></span> <span><span class='c'>#&gt; x_1 x_2 w_4 y_2 rank1 rank2</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 6 11 5 1 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 3 7 12 2 3 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 8 13 4 2 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 1 9 14 0 1 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 10 15 6 2 4</span></span> <span></span></code></pre> </div> <p>We haven&rsquo;t deprecated using <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a> without supplying <code>.fns</code> yet, but we plan to in the future now that <a href="https://dplyr.tidyverse.org/reference/pick.html" target="_blank" rel="noopener"><code>pick()</code></a> exists as a better alternative.</p> <h2 id="reframe"><code>reframe()</code> <a href="#reframe"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As we mentioned in the <a href="https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/" target="_blank" rel="noopener">coming soon</a> blog post, in dplyr 1.1.0 we&rsquo;ve decided to walk back the change we introduced to <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> in dplyr 1.0.0 that allowed it to return per-group results of any length, rather than results of length 1. We think that the idea of multi-row results is extremely powerful, as it serves as a flexible way to apply arbitrary operations to each group, but we&rsquo;ve realized that <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> wasn&rsquo;t the best home for it because it increases the chance for users to run into silent recycling bugs (thanks to <a href="https://github.com/tidyverse/dplyr/issues/6382" target="_blank" rel="noopener">Kirill Müller</a> and <a href="https://twitter.com/drob/status/1563198515626770432?s=20&amp;t=iTFWSCPNOGWalIrpXHx2qg" target="_blank" rel="noopener">David Robinson</a> for bringing this to our attention).</p> <p>As an example, here we&rsquo;re computing the mean and standard deviation of <code>x</code>, grouped by <code>g</code>. Unfortunately, I accidentally forgot to use <code>sd(x)</code> and instead just typed <code>x</code>. Because of how <a href="https://vctrs.r-lib.org/reference/vector_recycling_rules.html" target="_blank" rel="noopener">tidyverse recycling rules</a> work, the multi-row behavior silently recycled the size 1 mean values instead of erroring, so rather than 2 rows, we end up with 5.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> g <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>2</span><span class='o'>)</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>4</span>, <span class='m'>3</span>, <span class='m'>6</span>, <span class='m'>2</span>, <span class='m'>8</span><span class='o'>)</span>,</span> <span> y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>8</span>, <span class='m'>9</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; g x y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 4 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 3 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 6 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 2 8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 8 9</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span></span> <span> x_average <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span>,</span> <span> x_sd <span class='o'>=</span> <span class='nv'>x</span>, <span class='c'># Oops</span></span> <span> .by <span class='o'>=</span> <span class='nv'>g</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in</span></span> <span><span class='c'>#&gt; dplyr 1.1.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `reframe()` instead.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> When switching from `summarise()` to `reframe()`, remember that `reframe()`</span></span> <span><span class='c'>#&gt; always returns an ungrouped data frame and adjust accordingly.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; g x_average x_sd</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 4.33 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 4.33 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 4.33 6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 5 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 5 8</span></span> <span></span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> now throws a warning when any group returns a result that isn&rsquo;t length 1. We expect to upgrade this to an error in the future to revert <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> back to its &ldquo;safe&rdquo; behavior of requiring 1 row per group.</p> <p> <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> also wasn&rsquo;t the best name for a function with this feature, as the name itself implies one row per group. After <a href="https://github.com/tidyverse/dplyr/issues/6565" target="_blank" rel="noopener">gathering some feedback</a>, we&rsquo;ve settled on a new verb with a more appropriate name, <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>. We think of <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> as a way to &ldquo;do something&rdquo; to each group, with no restrictions on the number of rows returned per group. The name has a nice connection to the tibble functions <a href="https://tibble.tidyverse.org/reference/enframe.html" target="_blank" rel="noopener"><code>tibble::enframe()</code></a> and <a href="https://tibble.tidyverse.org/reference/enframe.html" target="_blank" rel="noopener"><code>tibble::deframe()</code></a>, which are used for converting vectors to data frames and vice versa:</p> <ul> <li> <p><code>enframe()</code>: Takes a vector, returns a data frame</p> </li> <li> <p><code>deframe()</code>: Takes a data frame, returns a vector</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>: Takes a data frame, returns a data frame</p> </li> </ul> <p>One nice application of <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> is computing quantiles at various probability thresholds. It&rsquo;s particularly nice if we wrap <a href="https://rdrr.io/r/stats/quantile.html" target="_blank" rel="noopener"><code>quantile()</code></a> into a helper that returns a data frame, which <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> then automatically unpacks.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>quantile_df</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>probs</span> <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.25</span>, <span class='m'>0.5</span>, <span class='m'>0.75</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> value <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/quantile.html'>quantile</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>probs</span>, na.rm <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span>,</span> <span> prob <span class='o'>=</span> <span class='nv'>probs</span></span> <span> <span class='o'>)</span></span> <span><span class='o'>&#125;</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/reframe.html'>reframe</a></span><span class='o'>(</span><span class='nf'>quantile_df</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; g value prob</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 3.5 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 4 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 5 0.75</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 3.5 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 5 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 2 6.5 0.75</span></span> <span></span></code></pre> </div> <p>This also works well if you want to apply it to multiple columns using <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/reframe.html'>reframe</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>:</span><span class='nv'>y</span>, <span class='nv'>quantile_df</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; g x$value $prob y$value $prob</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 3.5 0.25 1.5 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 4 0.5 2 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 5 0.75 3.5 0.75</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 3.5 0.25 8.25 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 5 0.5 8.5 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 2 6.5 0.75 8.75 0.75</span></span> <span></span></code></pre> </div> <p>Because <code>quantile_df()</code> returns a tibble, we end up with <a href="https://tidyr.tidyverse.org/reference/pack.html" target="_blank" rel="noopener"><em>packed</em></a> data frame columns. You&rsquo;ll often want to unpack these into their individual columns, and <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a> has gained a new <code>.unpack</code> argument in 1.1.0 that helps you do exactly that:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/reframe.html'>reframe</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>:</span><span class='nv'>y</span>, <span class='nv'>quantile_df</span>, .unpack <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; g x_value x_prob y_value y_prob</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 3.5 0.25 1.5 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 4 0.5 2 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 5 0.75 3.5 0.75</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 3.5 0.25 8.25 0.25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 5 0.5 8.5 0.5 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 2 6.5 0.75 8.75 0.75</span></span> <span></span></code></pre> </div> <p>We expect that seeing <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> in a colleague&rsquo;s code will serve as an extremely clear signal that something &ldquo;special&rdquo; is happening, because they&rsquo;ve made a conscious decision to opt-into the 1% case of returning multiple rows per group.</p> <h2 id="arrange"><code>arrange()</code> <a href="#arrange"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We also mentioned in the <a href="https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/" target="_blank" rel="noopener">coming soon</a> post that <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> has undergone two user-facing changes:</p> <ul> <li> <p>When sorting character vectors, the C locale is now the default, rather than the system locale</p> </li> <li> <p>A new <code>.locale</code> argument, powered by stringi, allows you to explicitly request an alternative locale using a stringi locale identifier (like <code>&quot;en&quot;</code> for English, or <code>&quot;fr&quot;</code> for French)</p> </li> </ul> <p>These changes were made for two reasons:</p> <ul> <li> <p>Much faster performance by default, due to usage of a custom radix sort algorithm inspired by <a href="https://cran.r-project.org/web/packages/data.table/index.html" target="_blank" rel="noopener">data.table</a>&lsquo;s <code>forder()</code></p> </li> <li> <p>Improved reproducibility across R sessions, where different computers might use different system locales and different operating systems have different ways to specify the same system locale</p> </li> </ul> <p>If you use <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> for the purpose of grouping similar values together (and don&rsquo;t care much about the specific locale that it uses to do so), then you&rsquo;ll likely see performance improvements of up to 100x in dplyr 1.1.0. If you do care about the locale and supply <code>.locale</code>, you should still see improvements of up to 10x.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># 10,000 random strings, sampled up to 1,000,000 rows</span></span> <span><span class='nv'>dictionary</span> <span class='o'>&lt;-</span> <span class='nf'>stringi</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/stringi/man/stri_rand_strings.html'>stri_rand_strings</a></span><span class='o'>(</span><span class='m'>10000</span>, length <span class='o'>=</span> <span class='m'>10</span>, pattern <span class='o'>=</span> <span class='s'>"[a-z]"</span><span class='o'>)</span></span> <span><span class='nv'>str</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>dictionary</span>, size <span class='o'>=</span> <span class='m'>1e6</span>, replace <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>str</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1,000,000 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> slpqkdtpyr</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> xtoucpndhc</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> vsvfoqcyqm</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> gnbpkwcmse</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> xutzdqxpsi</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> gkolsrndrz</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> mitqahkkou</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> eehfrrimhd</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> ymxxjczjsv</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> svpvizfxwe</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 999,990 more rows</span></span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># dplyr 1.0.10 (American English system locale)</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec`</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x) 4.38s 4.89s 0.204 12.7MB 0.148</span></span> <span></span> <span><span class='c'># dplyr 1.1.0 (C locale default, 100x faster)</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec`</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x) 42.3ms 46.6ms 20.8 22.4MB 46.0</span></span> <span></span> <span><span class='c'># dplyr 1.1.0 (American English `.locale`, 10x faster)</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span>, .locale <span class='o'>=</span> <span class='s'>"en"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:&gt; &lt;dbl&gt; &lt;bch:byt&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x, .locale = "en") 377ms 430ms 2.21 27.9MB</span></span> <span><span class='c'>#&gt; # … with 1 more variable: `gc/sec` &lt;dbl&gt;</span></span></code></pre> </div> <p>We are hopeful that switching to a C locale default will have a relatively small amount of impact in exchange for much faster performance. To read more about the exact differences between the C locale and locales like American English or Spanish, see the <a href="https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/#arrange-improvements-with-character-vectors" target="_blank" rel="noopener">coming soon</a> post or our detailed <a href="https://github.com/tidyverse/tidyups/blob/main/003-dplyr-radix-ordering.md" target="_blank" rel="noopener">tidyup</a>. If you are having trouble converting an existing script over to the new behavior, you can set the temporary global option <code>options(dplyr.legacy_locale = TRUE)</code>, which will revert to the pre-1.1.0 behavior of using the system locale. We expect to remove this option in a future release.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to the 88 contributors who helped make the 1.1.0 release possible by opening issues, contributing features and documentation, and asking questions! <a href="https://github.com/7708801314520dym" target="_blank" rel="noopener">@7708801314520dym</a>, <a href="https://github.com/abalter" target="_blank" rel="noopener">@abalter</a>, <a href="https://github.com/aghaynes" target="_blank" rel="noopener">@aghaynes</a>, <a href="https://github.com/AlbertRapp" target="_blank" rel="noopener">@AlbertRapp</a>, <a href="https://github.com/AlexGaithuma" target="_blank" rel="noopener">@AlexGaithuma</a>, <a href="https://github.com/algsat" target="_blank" rel="noopener">@algsat</a>, <a href="https://github.com/andrewbaxter439" target="_blank" rel="noopener">@andrewbaxter439</a>, <a href="https://github.com/andrewpbray" target="_blank" rel="noopener">@andrewpbray</a>, <a href="https://github.com/asadow" target="_blank" rel="noopener">@asadow</a>, <a href="https://github.com/asmlgkj" target="_blank" rel="noopener">@asmlgkj</a>, <a href="https://github.com/barbosawf" target="_blank" rel="noopener">@barbosawf</a>, <a href="https://github.com/barnabasharris" target="_blank" rel="noopener">@barnabasharris</a>, <a href="https://github.com/bart1" target="_blank" rel="noopener">@bart1</a>, <a href="https://github.com/bergsmat" target="_blank" rel="noopener">@bergsmat</a>, <a href="https://github.com/chrisbrownlie" target="_blank" rel="noopener">@chrisbrownlie</a>, <a href="https://github.com/cjyetman" target="_blank" rel="noopener">@cjyetman</a>, <a href="https://github.com/CNUlichao" target="_blank" rel="noopener">@CNUlichao</a>, <a href="https://github.com/daattali" target="_blank" rel="noopener">@daattali</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/davidchall" target="_blank" rel="noopener">@davidchall</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/donboyd5" target="_blank" rel="noopener">@donboyd5</a>, <a href="https://github.com/drmowinckels" target="_blank" rel="noopener">@drmowinckels</a>, <a href="https://github.com/dxtxs1" target="_blank" rel="noopener">@dxtxs1</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/eogoodwin" target="_blank" rel="noopener">@eogoodwin</a>, <a href="https://github.com/erhoppe" target="_blank" rel="noopener">@erhoppe</a>, <a href="https://github.com/eutwt" target="_blank" rel="noopener">@eutwt</a>, <a href="https://github.com/ggrothendieck" target="_blank" rel="noopener">@ggrothendieck</a>, <a href="https://github.com/grayskripko" target="_blank" rel="noopener">@grayskripko</a>, <a href="https://github.com/H-Mateus" target="_blank" rel="noopener">@H-Mateus</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/haozhou1988" target="_blank" rel="noopener">@haozhou1988</a>, <a href="https://github.com/hassanjfry" target="_blank" rel="noopener">@hassanjfry</a>, <a href="https://github.com/Hesham999666" target="_blank" rel="noopener">@Hesham999666</a>, <a href="https://github.com/hideaki" target="_blank" rel="noopener">@hideaki</a>, <a href="https://github.com/jeffreypullin" target="_blank" rel="noopener">@jeffreypullin</a>, <a href="https://github.com/jic007" target="_blank" rel="noopener">@jic007</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/jonspring" target="_blank" rel="noopener">@jonspring</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/jpeacock29" target="_blank" rel="noopener">@jpeacock29</a>, <a href="https://github.com/kendonB" target="_blank" rel="noopener">@kendonB</a>, <a href="https://github.com/kenkoonwong" target="_blank" rel="noopener">@kenkoonwong</a>, <a href="https://github.com/kevinushey" target="_blank" rel="noopener">@kevinushey</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/larry77" target="_blank" rel="noopener">@larry77</a>, <a href="https://github.com/latot" target="_blank" rel="noopener">@latot</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/llayman12" target="_blank" rel="noopener">@llayman12</a>, <a href="https://github.com/LukasWallrich" target="_blank" rel="noopener">@LukasWallrich</a>, <a href="https://github.com/m-sostero" target="_blank" rel="noopener">@m-sostero</a>, <a href="https://github.com/machow" target="_blank" rel="noopener">@machow</a>, <a href="https://github.com/mc-unimi" target="_blank" rel="noopener">@mc-unimi</a>, <a href="https://github.com/mgacc0" target="_blank" rel="noopener">@mgacc0</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichelleSMA" target="_blank" rel="noopener">@MichelleSMA</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/moriarais" target="_blank" rel="noopener">@moriarais</a>, <a href="https://github.com/NicChr" target="_blank" rel="noopener">@NicChr</a>, <a href="https://github.com/nstjhp" target="_blank" rel="noopener">@nstjhp</a>, <a href="https://github.com/omarwh" target="_blank" rel="noopener">@omarwh</a>, <a href="https://github.com/orgadish" target="_blank" rel="noopener">@orgadish</a>, <a href="https://github.com/rempsyc" target="_blank" rel="noopener">@rempsyc</a>, <a href="https://github.com/rorynolan" target="_blank" rel="noopener">@rorynolan</a>, <a href="https://github.com/ryanvoyack" target="_blank" rel="noopener">@ryanvoyack</a>, <a href="https://github.com/selkamand" target="_blank" rel="noopener">@selkamand</a>, <a href="https://github.com/seth-cp" target="_blank" rel="noopener">@seth-cp</a>, <a href="https://github.com/shalom-lab" target="_blank" rel="noopener">@shalom-lab</a>, <a href="https://github.com/shannonpileggi" target="_blank" rel="noopener">@shannonpileggi</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/sjackson1997" target="_blank" rel="noopener">@sjackson1997</a>, <a href="https://github.com/spono" target="_blank" rel="noopener">@spono</a>, <a href="https://github.com/stibu81" target="_blank" rel="noopener">@stibu81</a>, <a href="https://github.com/tfehring" target="_blank" rel="noopener">@tfehring</a>, <a href="https://github.com/Theresaliu" target="_blank" rel="noopener">@Theresaliu</a>, <a href="https://github.com/TimBMK" target="_blank" rel="noopener">@TimBMK</a>, <a href="https://github.com/TimTeaFan" target="_blank" rel="noopener">@TimTeaFan</a>, <a href="https://github.com/Torvaney" target="_blank" rel="noopener">@Torvaney</a>, <a href="https://github.com/turbanisch" target="_blank" rel="noopener">@turbanisch</a>, <a href="https://github.com/weiyangtham" target="_blank" rel="noopener">@weiyangtham</a>, <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>, <a href="https://github.com/xet869" target="_blank" rel="noopener">@xet869</a>, <a href="https://github.com/yuliaUU" target="_blank" rel="noopener">@yuliaUU</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>.</p> dplyr 1.1.0: The power of vctrs https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs/ Thu, 02 Feb 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-vctrs/ <p>Today&rsquo;s <a href="https://dplyr.tidyverse.org/news/index.html#dplyr-110" target="_blank" rel="noopener">dplyr 1.1.0</a> post is focused on various updates to vector functions, like <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> and <a href="https://dplyr.tidyverse.org/reference/between.html" target="_blank" rel="noopener"><code>between()</code></a>. If you missed our previous posts, you can also see the other <a href="https://www.tidyverse.org/tags/dplyr-1-1-0/" target="_blank" rel="noopener">blog posts</a> in this series. All of dplyr&rsquo;s vector functions are now backed by <a href="https://vctrs.r-lib.org/" target="_blank" rel="noopener">vctrs</a>, which typically results in better error messages, better performance, and greater versatility.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dplyr"</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="case_when"><code>case_when()</code> <a href="#case_when"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>If you&rsquo;ve used <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> before, you&rsquo;ve probably written a statement like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>12</span>, <span class='o'>-</span><span class='m'>5</span>, <span class='m'>6</span>, <span class='o'>-</span><span class='m'>2</span>, <span class='kc'>NA</span>, <span class='m'>0</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>10</span> <span class='o'>~</span> <span class='s'>"large"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>0</span> <span class='o'>~</span> <span class='s'>"small"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&lt;</span> <span class='m'>0</span> <span class='o'>~</span> <span class='kc'>NA</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; Error: `NA` must be &lt;character&gt;, not &lt;logical&gt;.</span></span></code></pre> </div> <p>Like me, you&rsquo;ve probably forgotten that <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> has historically been strict about the types on the right-hand side of the <code>~</code>, which means that I needed to use <code>NA_character_</code> here instead of <code>NA</code>. Luckily, the switch to vctrs means that the above code now &ldquo;just works&rdquo;:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>10</span> <span class='o'>~</span> <span class='s'>"large"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>0</span> <span class='o'>~</span> <span class='s'>"small"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&lt;</span> <span class='m'>0</span> <span class='o'>~</span> <span class='kc'>NA</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "small" "large" NA "small" NA NA "small"</span></span> <span></span></code></pre> </div> <p>You&rsquo;ve probably also written a statement like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>10</span> <span class='o'>~</span> <span class='s'>"large"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>0</span> <span class='o'>~</span> <span class='s'>"small"</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"missing"</span>,</span> <span> <span class='kc'>TRUE</span> <span class='o'>~</span> <span class='s'>"other"</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "small" "large" "other" "small" "other" "missing" "small"</span></span> <span></span></code></pre> </div> <p>In this case, we have a fall-through &ldquo;default&rdquo; captured by <code>TRUE ~</code>. This has always felt a little awkward and is fairly difficult to explain to new R users. To make this clearer, we&rsquo;ve added an explicit <code>.default</code> argument that we encourage you to use instead:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>10</span> <span class='o'>~</span> <span class='s'>"large"</span>,</span> <span> <span class='nv'>x</span> <span class='o'>&gt;=</span> <span class='m'>0</span> <span class='o'>~</span> <span class='s'>"small"</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"missing"</span>,</span> <span> .default <span class='o'>=</span> <span class='s'>"other"</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "small" "large" "other" "small" "other" "missing" "small"</span></span> <span></span></code></pre> </div> <p><code>.default</code> will always be processed last, regardless of where you put it in the call to <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a>, so we recommend placing it at the very end.</p> <p>We haven&rsquo;t started any formal deprecation process for <code>TRUE ~</code> yet, but now that there is a better solution available we encourage you to switch over. We do plan to deprecate this feature in the future because it involves some slightly problematic recycling rules (but we wouldn&rsquo;t even begin this process for at least a year).</p> <h2 id="case_match"><code>case_match()</code> <a href="#case_match"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another type of <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> statement you&rsquo;ve probably written is some kind of value remapping like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"USA"</span>, <span class='s'>"Canada"</span>, <span class='s'>"Wales"</span>, <span class='s'>"UK"</span>, <span class='s'>"China"</span>, <span class='kc'>NA</span>, <span class='s'>"Mexico"</span>, <span class='s'>"Russia"</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_when.html'>case_when</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"USA"</span>, <span class='s'>"Canada"</span>, <span class='s'>"Mexico"</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"North America"</span>,</span> <span> <span class='nv'>x</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Wales"</span>, <span class='s'>"UK"</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"Europe"</span>,</span> <span> <span class='nv'>x</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> <span class='s'>"China"</span> <span class='o'>~</span> <span class='s'>"Asia"</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "North America" "North America" "Europe" "Europe" </span></span> <span><span class='c'>#&gt; [5] "Asia" NA "North America" NA</span></span> <span></span></code></pre> </div> <p>Remapping values in this way is so common that SQL gives it its own name - the &ldquo;simple&rdquo; case statement. To streamline this further, we&rsquo;ve taken out some of the repetition involved with <code>x %in%</code> by introducing <a href="https://dplyr.tidyverse.org/reference/case_match.html" target="_blank" rel="noopener"><code>case_match()</code></a>, a variant of <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> that allows you to specify one or more <em>values</em> on the left-hand side of the <code>~</code>, rather than logical vectors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_match.html'>case_match</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"USA"</span>, <span class='s'>"Canada"</span>, <span class='s'>"Mexico"</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"North America"</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"France"</span>, <span class='s'>"UK"</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"Europe"</span>,</span> <span> <span class='s'>"China"</span> <span class='o'>~</span> <span class='s'>"Asia"</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "North America" "North America" NA "Europe" </span></span> <span><span class='c'>#&gt; [5] "Asia" NA "North America" NA</span></span> <span></span></code></pre> </div> <p>I think that <a href="https://dplyr.tidyverse.org/reference/case_match.html" target="_blank" rel="noopener"><code>case_match()</code></a> is particularly neat because it can be wrapped into an ad-hoc replacement helper if you just need to collapse or replace a few problematic values in a vector, while leaving everything else unchanged:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>replace_match</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/case_match.html'>case_match</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>...</span>, .default <span class='o'>=</span> <span class='nv'>x</span>, .ptype <span class='o'>=</span> <span class='nv'>x</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nf'>replace_match</span><span class='o'>(</span></span> <span> <span class='nv'>x</span>, </span> <span> <span class='s'>"USA"</span> <span class='o'>~</span> <span class='s'>"United States"</span>, </span> <span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"UK"</span>, <span class='s'>"Wales"</span><span class='o'>)</span> <span class='o'>~</span> <span class='s'>"United Kingdom"</span>,</span> <span> <span class='kc'>NA</span> <span class='o'>~</span> <span class='s'>"[Missing]"</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "United States" "Canada" "United Kingdom" "United Kingdom"</span></span> <span><span class='c'>#&gt; [5] "China" "[Missing]" "Mexico" "Russia"</span></span> <span></span></code></pre> </div> <h2 id="consecutive_id"><code>consecutive_id()</code> <a href="#consecutive_id"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>At Posit, we have regular company update meetings. Since we are all remote, these meetings are over Zoom. Zoom has a neat feature where it can record the transcript of your call, and it will report who was speaking and what they said. It looks something like this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transcript</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span></span> <span> <span class='o'>~</span><span class='nv'>name</span>, <span class='o'>~</span><span class='nv'>text</span>,</span> <span> <span class='s'>"Hadley"</span>, <span class='s'>"I'll never learn Python."</span>,</span> <span> <span class='s'>"Davis"</span>, <span class='s'>"But aren't you speaking at PyCon?"</span>,</span> <span> <span class='s'>"Hadley"</span>, <span class='s'>"So?"</span>,</span> <span> <span class='s'>"Hadley"</span>, <span class='s'>"That doesn't influence my decision."</span>,</span> <span> <span class='s'>"Hadley"</span>, <span class='s'>"I'm not budging!"</span>,</span> <span> <span class='s'>"Mara"</span>, <span class='s'>"Typical, Hadley. Stubborn as always."</span>,</span> <span> <span class='s'>"Davis"</span>, <span class='s'>"Fair enough!"</span>,</span> <span> <span class='s'>"Davis"</span>, <span class='s'>"Let's move on."</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>transcript</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 8 × 2</span></span></span> <span><span class='c'>#&gt; name text </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Hadley I'll never learn Python. </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Davis But aren't you speaking at PyCon? </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Hadley So? </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> Hadley That doesn't influence my decision. </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> Hadley I'm not budging! </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> Mara Typical, Hadley. Stubborn as always.</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> Davis Fair enough! </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>8</span> Davis Let's move on.</span></span> <span></span></code></pre> </div> <p>We were working with this data and wanted a way to collapse each continuous thought down to one line. For example, rows 3-5 all contain a single idea from Hadley, so we&rsquo;d like those to be collapsed into a single line. This isn&rsquo;t quite as straightforward as a simple group-by-<code>name</code> and <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transcript</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>text <span class='o'>=</span> <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten</a></span><span class='o'>(</span><span class='nv'>text</span>, collapse <span class='o'>=</span> <span class='s'>" "</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>name</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; name text </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Hadley I'll never learn Python. So? That doesn't influence my decision. I'm n…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Davis But aren't you speaking at PyCon? Fair enough! Let's move on. </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Mara Typical, Hadley. Stubborn as always.</span></span> <span></span></code></pre> </div> <p>This isn&rsquo;t quite right because it collapsed the first row where Hadley says &ldquo;I&rsquo;ll never learn Python&rdquo; alongside rows 3-5. We need a way to identify consecutive <em>runs</em> representing when a single person is speaking, which is exactly what <a href="https://dplyr.tidyverse.org/reference/consecutive_id.html" target="_blank" rel="noopener"><code>consecutive_id()</code></a> is for!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transcript</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/consecutive_id.html'>consecutive_id</a></span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 8 × 3</span></span></span> <span><span class='c'>#&gt; name text id</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Hadley I'll never learn Python. 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Davis But aren't you speaking at PyCon? 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Hadley So? 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> Hadley That doesn't influence my decision. 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> Hadley I'm not budging! 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> Mara Typical, Hadley. Stubborn as always. 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> Davis Fair enough! 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>8</span> Davis Let's move on. 5</span></span> <span></span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/consecutive_id.html" target="_blank" rel="noopener"><code>consecutive_id()</code></a> takes one or more columns and generates an integer vector that increments every time a value in one of those columns changes. This gives us something we can group on to correctly flatten our <code>text</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transcript</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/consecutive_id.html'>consecutive_id</a></span><span class='o'>(</span><span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>text <span class='o'>=</span> <span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten</a></span><span class='o'>(</span><span class='nv'>text</span>, collapse <span class='o'>=</span> <span class='s'>" "</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>id</span>, <span class='nv'>name</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; id name text </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 Hadley I'll never learn Python. </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 Davis But aren't you speaking at PyCon? </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 Hadley So? That doesn't influence my decision. I'm not budging!</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 4 Mara Typical, Hadley. Stubborn as always. </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 5 Davis Fair enough! Let's move on.</span></span> <span></span></code></pre> </div> <p>Grouping by <code>id</code> alone is actually enough, but I&rsquo;ve also grouped by <code>name</code> for a convenient way to drag the name along into the summary table.</p> <p> <a href="https://dplyr.tidyverse.org/reference/consecutive_id.html" target="_blank" rel="noopener"><code>consecutive_id()</code></a> is inspired by <a href="https://rdatatable.gitlab.io/data.table/reference/rleid.html" target="_blank" rel="noopener"><code>data.table::rleid()</code></a>, which serves a similar purpose.</p> <h2 id="miscellaneous-updates">Miscellaneous updates <a href="#miscellaneous-updates"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><ul> <li> <p> <a href="https://dplyr.tidyverse.org/reference/between.html" target="_blank" rel="noopener"><code>between()</code></a> is no longer restricted to length 1 <code>left</code> and <code>right</code> boundaries. They are now allowed to be length 1 or the same length as <code>x</code>. Additionally, <a href="https://dplyr.tidyverse.org/reference/between.html" target="_blank" rel="noopener"><code>between()</code></a> now works with any type supported by vctrs, rather than just with numerics and date-times.</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/if_else.html" target="_blank" rel="noopener"><code>if_else()</code></a> has received the same updates as <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a>. In particular, it is no longer as strict about typed missing values.</p> </li> <li> <p>The ranking functions, like <a href="https://dplyr.tidyverse.org/reference/row_number.html" target="_blank" rel="noopener"><code>dense_rank()</code></a>, now allow data frame inputs as a way to rank by multiple columns at once.</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>first()</code></a>, <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>last()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/nth.html" target="_blank" rel="noopener"><code>nth()</code></a> have all gained an <code>na_rm</code> argument since they are summary functions.</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/na_if.html" target="_blank" rel="noopener"><code>na_if()</code></a> now casts <code>y</code> to the type of <code>x</code> to make it clear that it is type stable on <code>x</code>. In particular, this means you can no longer do <code>na_if(&lt;tbl&gt;, 0)</code>, which previously accidentally allowed you to attempt to replace missing values in every column with <code>0</code>. This function has always been intended as a vector function, and this is considered off-label usage. It also now replaces <code>NaN</code> values in double and complex vectors.</p> </li> </ul> dplyr 1.1.0: Per-operation grouping https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-operation-grouping/ Wed, 01 Feb 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-per-operation-grouping/ <p>Today we are going to look at one of the major new features in <a href="https://dplyr.tidyverse.org/news/index.html#dplyr-110" target="_blank" rel="noopener">dplyr 1.1.0</a>, per-operation grouping with <a href="https://dplyr.tidyverse.org/reference/dplyr_by.html" target="_blank" rel="noopener"><code>.by</code>/<code>by</code></a>. Per-operation grouping is an experimental alternative to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> which is only active within a single dplyr verb. This is another of the new dplyr features that was inspired by <a href="https://cran.r-project.org/web/packages/data.table/index.html" target="_blank" rel="noopener">data.table</a>, this time by their own grouping syntax with <code>by</code>.</p> <p>To see the other blog posts in this series, head <a href="https://www.tidyverse.org/tags/dplyr-1-1-0/" target="_blank" rel="noopener">here</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dplyr"</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="persistent-grouping-with-group_by">Persistent grouping with <code>group_by()</code> <a href="#persistent-grouping-with-group_by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In dplyr, grouping radically affects the computation of the verb that you use it with. Since the very beginning of dplyr, you&rsquo;ve been able to perform grouped operations by modifying your data frame with <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>. This grouping is <em>persistent</em>, meaning that it typically sticks around in some form for more than one operation. As an example, take a look at this <code>transactions</code> dataset which tracks revenue brought in from various transactions across multiple companies. If we wanted to add a column for the total yearly revenue per company, we might do:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> company <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span><span class='o'>)</span>,</span> <span> year <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2019</span>, <span class='m'>2019</span>, <span class='m'>2020</span>, <span class='m'>2021</span>, <span class='m'>2023</span>, <span class='m'>2023</span><span class='o'>)</span>,</span> <span> revenue <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>20</span>, <span class='m'>50</span>, <span class='m'>4</span>, <span class='m'>10</span>, <span class='m'>12</span>, <span class='m'>18</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>transactions</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; company year revenue</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 50</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> B <span style='text-decoration: underline;'>2</span>023 18</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 4</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Groups: company, year [4]</span></span></span> <span><span class='c'>#&gt; company year revenue total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 20 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 50 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> A <span style='text-decoration: underline;'>2</span>020 4 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>021 10 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 12 30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> B <span style='text-decoration: underline;'>2</span>023 18 30</span></span> <span></span></code></pre> </div> <p>Notice that the result is still grouped by both <code>company</code> and <code>year</code>. This is useful if you need to follow up with additional grouped operations (with the exact same grouping columns), but many people follow this <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> with an <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>ungroup()</code></a>.</p> <p>If we only need the totals, we could also use <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, which peels off 1 layer of grouping by default:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; `summarise()` has grouped output by 'company'. You can override using the</span></span> <span><span class='c'>#&gt; `.groups` argument.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Groups: company [2]</span></span></span> <span><span class='c'>#&gt; company year total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 30</span></span> <span></span></code></pre> </div> <p>Here the grouping of the output isn&rsquo;t exactly the same as the input, but we still consider this persistent grouping because some of the groups outlive the verb they were used with.</p> <h2 id="per-operation-grouping-with-byby">Per-operation grouping with <code>.by</code>/<code>by</code> <a href="#per-operation-grouping-with-byby"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In dplyr 1.1.0, we&rsquo;ve added an alternative to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> known as <a href="https://dplyr.tidyverse.org/reference/dplyr_by.html" target="_blank" rel="noopener"><code>.by</code></a> that introduces the idea of <em>per-operation</em> grouping:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 4</span></span></span> <span><span class='c'>#&gt; company year revenue total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 20 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 50 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> A <span style='text-decoration: underline;'>2</span>020 4 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>021 10 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 12 30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> B <span style='text-decoration: underline;'>2</span>023 18 30</span></span> <span></span><span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; company year total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 30</span></span> <span></span></code></pre> </div> <p>There are a few things about <code>.by</code> worth noting:</p> <ul> <li> <p>The result is always ungrouped, regardless of the number of grouping columns. With <code>.by</code>, you never need to remember to call <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>ungroup()</code></a>.</p> </li> <li> <p>We used <a href="https://tidyselect.r-lib.org/reference/language.html" target="_blank" rel="noopener">tidyselect</a> to group by multiple columns.</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> didn&rsquo;t emit a message about regrouping.</p> </li> </ul> <p>One of the things we like about <code>.by</code> is that it allows you to place the grouping specification alongside the code that uses it, rather than in a separate <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> line. This idea was inspired by data.table&rsquo;s grouping syntax, which looks like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span><span class='o'>[</span>, <span class='nf'>.</span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span><span class='o'>)</span>, by <span class='o'>=</span> <span class='nf'>.</span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span><span class='o'>]</span></span></code></pre> </div> <p>To see a complete list of dplyr verbs that support <code>.by</code>, look <a href="https://dplyr.tidyverse.org/reference/dplyr_by.html#supported-verbs" target="_blank" rel="noopener">here</a>.</p> <h3 id="by-or-by"><code>.by</code> or <code>by</code>? <a href="#by-or-by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As you use per-operation grouping in dplyr, you&rsquo;ll likely notice that some verbs use <code>.by</code> and others use <code>by</code>, for example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice_max</a></span><span class='o'>(</span><span class='nv'>revenue</span>, n <span class='o'>=</span> <span class='m'>2</span>, by <span class='o'>=</span> <span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; company year revenue</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>023 18</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12</span></span> <span></span></code></pre> </div> <p>This is a technical difference resulting from the fact that some verbs consistently use a <code>.</code> prefix for their arguments, and others don&rsquo;t (see our design notes on the <a href="https://design.tidyverse.org/dots-prefix.html" target="_blank" rel="noopener">dot prefix</a> for more details). Most dplyr verbs use <code>.by</code>, and we&rsquo;ve tried to ensure that the cases that are most likely to result in typos instead generate an informative error:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Uses `by` to be consistent with `n` and `prop`</span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice_max</a></span><span class='o'>(</span><span class='nv'>revenue</span>, n <span class='o'>=</span> <span class='m'>2</span>, .by <span class='o'>=</span> <span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `slice_max()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't specify an argument named `.by` in this verb.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Did you mean to use `by` instead?</span></span> <span></span><span></span> <span><span class='c'># Uses `.by` to be consistent with `.preserve`</span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice</a></span><span class='o'>(</span><span class='nv'>revenue</span>, by <span class='o'>=</span> <span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `slice()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't specify an argument named `by` in this verb.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Did you mean to use `.by` instead?</span></span> <span></span></code></pre> </div> <h3 id="translating-from-group_by">Translating from <code>group_by()</code> <a href="#translating-from-group_by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>You shouldn&rsquo;t feel pressured to translate existing code using <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> to use <code>.by</code> instead. <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> won&rsquo;t ever disappear, and is not currently being superseded.</p> <p>That said, if you do want to start using <code>.by</code>, there are a few differences from <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> to be aware of.</p> <ul> <li> <p><code>.by</code> always returns an ungrouped data frame. This is one of the main reasons to use <code>.by</code>, but is worth keeping in mind if you have existing code that takes advantage of persistent grouping from <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>.</p> </li> <li> <p><code>.by</code> uses tidy-selection. <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>, on the other hand, works more like <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> in that it allows you to create grouping columns on the fly, i.e. <code>df |&gt; group_by(month = floor_date(date, &quot;month&quot;))</code>. With <code>.by</code>, you must create your grouping columns ahead of time. An added benefit of <code>.by</code>'s usage of tidy-selection is that you can supply an external character vector of grouping variables using <code>.by = all_of(groups_vec)</code>.</p> </li> <li> <p><code>.by</code> doesn&rsquo;t sort grouping keys. <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> always sorts keys in ascending order, which affects the results of verbs like <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>.</p> </li> </ul> <p>The last point might seem strange, but consider what happens if we preferred our transactions data in order by descending year so that the most recent transactions are at the top.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions2</span> <span class='o'>&lt;-</span> <span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/desc.html'>desc</a></span><span class='o'>(</span><span class='nv'>year</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>transactions2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 3</span></span></span> <span><span class='c'>#&gt; company year revenue</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> A <span style='text-decoration: underline;'>2</span>019 50</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 18</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Note that `group_by()` re-ordered</span></span> <span><span class='nv'>transactions2</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span>, .groups <span class='o'>=</span> <span class='s'>"drop"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; company year total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 30</span></span> <span></span><span></span> <span><span class='c'># But `.by` used whatever order was already there</span></span> <span><span class='nv'>transactions2</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>total <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span><span class='nv'>revenue</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>company</span>, <span class='nv'>year</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; company year total</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>019 70</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>023 30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span></span></code></pre> </div> <p>Notice that <code>.by</code> doesn&rsquo;t re-sort the grouping keys. Instead, the previous call to <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> is &ldquo;respected&rdquo; in the summary (this is also useful in combination with the new <code>.locale</code> argument to <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a>).</p> <p>We expect that most code won&rsquo;t depend on the ordering of these group keys, but it is worth keeping in mind if you are switching to <code>.by</code>. If you did rely on sorted group keys, you currently need to explicitly call <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> either before or after the call to <code>summarise(.by =)</code>. In a future release, we may add <a href="https://github.com/tidyverse/dplyr/issues/6663" target="_blank" rel="noopener">an argument</a> to control this.</p> <h2 id="nestby--"><code>nest(.by = )</code> <a href="#nestby--"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></span></code></pre> </div> <p>The idea behind <code>.by</code> turns out to be useful in contexts outside of dplyr. In <a href="https://www.tidyverse.org/blog/2023/01/tidyr-1-3-0/#nestby" target="_blank" rel="noopener">tidyr 1.3.0</a>, <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>nest()</code></a> gained a <code>.by</code> argument, allowing you to specify the columns you want to nest <em>by</em> rather than the columns that appear in the nested results, which often makes for more natural calls to <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>nest()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Specify what to nest by</span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>.by <span class='o'>=</span> <span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; company data </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='color: #555555;'>&lt;tibble [3 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='color: #555555;'>&lt;tibble [3 × 2]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># Specify what to nest</span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='o'>!</span><span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; company data </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='color: #555555;'>&lt;tibble [3 × 2]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='color: #555555;'>&lt;tibble [3 × 2]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># Specify both, allowing you to drop `year` along the way</span></span> <span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>revenue</span>, .by <span class='o'>=</span> <span class='nv'>company</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; company data </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='color: #555555;'>&lt;tibble [3 × 1]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='color: #555555;'>&lt;tibble [3 × 1]&gt;</span></span></span> <span></span></code></pre> </div> <p>We currently have 3 different nesting variants in the tidyverse: <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>tidyr::nest()</code></a>, <a href="https://dplyr.tidyverse.org/reference/group_nest.html" target="_blank" rel="noopener"><code>dplyr::group_nest()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/nest_by.html" target="_blank" rel="noopener"><code>dplyr::nest_by()</code></a>. Because the tidyr variant is now the most flexible of all of these, and because <a href="https://tidyr.tidyverse.org/reference/unnest.html" target="_blank" rel="noopener"><code>unnest()</code></a> also lives in tidyr, we are likely to deprecate the two experimental dplyr options in the future.</p> dplyr 1.1.0: Joins https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/ Tue, 31 Jan 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/ <p> <a href="https://dplyr.tidyverse.org/news/index.html#dplyr-110" target="_blank" rel="noopener">dplyr 1.1.0</a> is out now! This is a giant release, so we&rsquo;re splitting the release announcement up into four blog posts which we&rsquo;ll post over the course of this week. Today, we&rsquo;re focusing on joins, including the new <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> syntax, new warnings for multiple matches, inequality joins, rolling joins, and new tools for handling unmatched rows. To learn more about joins, you might want to read the updated <a href="https://r4ds.hadley.nz/joins.html" target="_blank" rel="noopener">joins chapter</a> in the upcoming 2nd edition of <a href="https://r4ds.hadley.nz" target="_blank" rel="noopener">R for Data Science</a>.</p> <p>This version of dplyr includes a number of features inspired by our <a href="https://cran.r-project.org/web/packages/data.table/index.html" target="_blank" rel="noopener">data.table</a> friends. The inequality and rolling joins we discuss today were popularized in R by data.table, and greatly inspired our own implementation.</p> <p>To see the other blog posts in this series, head <a href="https://www.tidyverse.org/tags/dplyr-1-1-0/" target="_blank" rel="noopener">here</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dplyr"</span><span class='o'>)</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="join_by"><code>join_by()</code> <a href="#join_by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Consider the following two tables, <code>transactions</code> and <code>companies</code>. <code>transactions</code> tracks sales across various years for different companies, and <code>companies</code> connects the short company id to its actual company name - either Patagonia (a fellow B-Corp!) or RStudio.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> company <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span><span class='o'>)</span>,</span> <span> year <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2019</span>, <span class='m'>2020</span>, <span class='m'>2021</span>, <span class='m'>2023</span><span class='o'>)</span>,</span> <span> revenue <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>50</span>, <span class='m'>4</span>, <span class='m'>10</span>, <span class='m'>12</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nv'>transactions</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; company year revenue</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12</span></span> <span></span><span></span> <span><span class='nv'>companies</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span><span class='o'>)</span>,</span> <span> name <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Patagonia"</span>, <span class='s'>"RStudio"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nv'>companies</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; id name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B RStudio</span></span> <span></span></code></pre> </div> <p>To join these two tables together, we might use an inner join:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>company <span class='o'>=</span> <span class='s'>"id"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 4</span></span></span> <span><span class='c'>#&gt; company year revenue name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 RStudio</span></span> <span></span></code></pre> </div> <p>This works great, but has always felt a little clunky. Specifying <code>c(company = &quot;id&quot;)</code> is a little awkward because it uses <code>=</code>, not <code>==</code>: here we&rsquo;re asserting that we want <code>company</code> to equal <code>id</code>, not naming a function argument or performing assignment. We&rsquo;ve improved on this with a new helper, <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a>, which takes expressions in a way that allows you to more naturally express this join:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Join By:</span></span> <span><span class='c'>#&gt; - company == id</span></span> <span></span></code></pre> </div> <p>This <em>join specification</em> can be used as the <code>by</code> argument in any of the <code>*_join()</code> functions:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 4</span></span></span> <span><span class='c'>#&gt; company year revenue name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 RStudio</span></span> <span></span></code></pre> </div> <p>This small quality of life improvement is just one of the many new features that come with <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a>. We&rsquo;ll look at more of these next.</p> <h2 id="multiple-matches">Multiple matches <a href="#multiple-matches"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><hr> <p><strong>Update</strong>: As of March 22, dplyr 1.1.1 is available on CRAN, which alters the behavior of multiple match detection so that you see warnings much less often. Read <a href="https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/">all about it</a> or install it now with <code>install.packages(&quot;dplyr&quot;)</code>.</p> <hr> <p>To make things a little more interesting, we&rsquo;ll add one more column to <code>companies</code>, and one more row:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>companies</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span><span class='o'>)</span>,</span> <span> since <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1973</span>, <span class='m'>2009</span>, <span class='m'>2022</span><span class='o'>)</span>,</span> <span> name <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Patagonia"</span>, <span class='s'>"RStudio"</span>, <span class='s'>"Posit"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>companies</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; id since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>This table now also tracks name changes that have happened over the course of a company&rsquo;s history. In 2022, we changed our name from RStudio to Posit, so we&rsquo;ve tracked that as an additional row in our dataset. Note that both RStudio and Posit are given an <code>id</code> of <code>B</code>, which links back to the <code>transactions</code> table.</p> <p>If we were to join these two tables together, ideally we&rsquo;d bring over the name that was in effect when the transaction took place. For example, for the transaction in 2021, the company was still RStudio, so ideally we&rsquo;d only match up against the RStudio row in <code>companies</code>. If we colored the expected matches, they&rsquo;d look something like this:</p> <p><img src="img/ideal-join.png" alt=""></p> <p>How can we do this? We can try the same join from before, but we won&rsquo;t like the results:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>faulty</span> <span class='o'>&lt;-</span> <span class='nv'>transactions</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, by <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in inner_join(transactions, companies, by = join_by(company == id)): Each row in `x` is expected to match at most 1 row in `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 3 of `x` matches multiple rows.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> If multiple matches are expected, set `multiple = "all"` to silence this</span></span> <span><span class='c'>#&gt; warning.</span></span> <span></span><span></span> <span><span class='nv'>faulty</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>022 Posit </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>Company <code>A</code> matches correctly, but since we only joined on the company id, we get <em>multiple matches</em> for each of company <code>B</code>'s transactions and end up with more rows than we started with. This is a problem, as we were expecting a 1:1 match for each row in <code>transactions</code>. Multiple matches in equality joins like this one are typically unexpected (even though they are baked in to SQL) so we&rsquo;ve also added a new warning to alert you when this happens. If multiple matches are expected, you can explicitly set <code>multiple = &quot;all&quot;</code> to silence this warning. This also serves as a code &ldquo;sign post&rdquo; for future readers of your code to let them know that this is a join that is expected to increase the number of rows in the data. If multiple matches <em>aren&rsquo;t</em> expected, you can also set <code>multiple = &quot;error&quot;</code> to immediately halt the analysis. We expect this will be useful as a quality control check for production code where you might rerun analyses with new data on a rolling basis.</p> <h2 id="inequality-joins">Inequality joins <a href="#inequality-joins"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To actually fix this issue, we&rsquo;ll need to expand our join specification to include another condition. Let&rsquo;s zoom in to just 2021:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>faulty</span>, <span class='nv'>company</span> <span class='o'>==</span> <span class='s'>"B"</span>, <span class='nv'>year</span> <span class='o'>==</span> <span class='m'>2021</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>We want to retain the match with RStudio, but not with Posit (because the name hasn&rsquo;t changed yet). One way to express this is by using the <code>year</code> and <code>since</code> columns to state that you only want a match if the transaction <code>year</code> occurred <em>after</em> a name change:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># `year[i] &gt;= since`?</span></span> <span><span class='m'>2021</span> <span class='o'>&gt;=</span> <span class='m'>2009</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span><span><span class='m'>2021</span> <span class='o'>&gt;=</span> <span class='m'>2022</span></span> <span><span class='c'>#&gt; [1] FALSE</span></span> <span></span></code></pre> </div> <p>Because <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> accepts expressions, we can express this inequality directly inside the join specification:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Join By:</span></span> <span><span class='c'>#&gt; - company == id</span></span> <span><span class='c'>#&gt; - year &gt;= since</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>This eliminated the 2021 match to Posit, as expected! This type of join is known as an <em>inequality join</em>, i.e. it involves at least one join expression containing one of the following inequality conditions: <code>&gt;=</code>, <code>&gt;</code>, <code>&lt;=</code>, or <code>&lt;</code>.</p> <p>However, we still have 2 matches corresponding to the 2023 year. In this case, we only wanted the match to Posit. We can understand why we are still getting multiple matches here by running the same row-by-row analysis as before:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># `year[i] &gt;= since`? Both are true!</span></span> <span><span class='m'>2023</span> <span class='o'>&gt;=</span> <span class='m'>2009</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span><span><span class='m'>2023</span> <span class='o'>&gt;=</span> <span class='m'>2022</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span></code></pre> </div> <p>To remove the last problematic match of the 2023 transaction to the RStudio name, we&rsquo;ll need to refine our join specification one more time.</p> <h2 id="rolling-joins">Rolling joins <a href="#rolling-joins"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Inequality conditions like <code>year &gt;= since</code> are powerful, but since the condition is only bounded on one side it is common for them to return a large number of matches. Since multiple matches are the typical case with inequality joins, we don&rsquo;t get a warning like with the equality join, but we clearly still haven&rsquo;t gotten the join right. As a reminder, here are where we still have too many matches:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='s'>"B"</span>, <span class='nv'>year</span> <span class='o'>==</span> <span class='m'>2023</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>009 RStudio</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>We need a way to filter down the matches returned from <code>year &gt;= since</code> to only the most recent name change. In other words, we prefer the Posit match over the RStudio match because 2022 is <em>closer</em> to the transaction year of 2023 than 2009 is. We can express this in <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> by using a helper named <code>closest()</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span><span class='nv'>companies</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nf'>closest</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p><code>closest(year &gt;= since)</code> finds all of the matches in <code>since</code> for a particular <code>year</code>, and then filters them down to only the closest match to that <code>year</code>. This is known as a <em>rolling join</em>, because in this case it <em>rolls</em> the most recent name change forward to match up with the transaction. Rolling joins were popularized by data.table, and are related to <code>ASOF</code> joins supported by some SQL flavors.</p> <p>There is a third new class of joins supported by <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> that we won&rsquo;t discuss today known as <em>overlap joins</em>. These are particularly useful in time series where you are looking for cases where a date or range of dates from one table <em>overlaps</em> a range of dates in another table. There are three helpers for overlap joins: <a href="https://dplyr.tidyverse.org/reference/join_by.html#overlap-joins" target="_blank" rel="noopener"><code>between()</code></a>, <a href="https://dplyr.tidyverse.org/reference/join_by.html#overlap-joins" target="_blank" rel="noopener"><code>overlaps()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/join_by.html#overlap-joins" target="_blank" rel="noopener"><code>within()</code></a>, which you can read more about <a href="https://dplyr.tidyverse.org/reference/join_by.html#overlap-joins" target="_blank" rel="noopener">in the documentation</a>.</p> <h2 id="unmatched-rows">Unmatched rows <a href="#unmatched-rows"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I mentioned earlier that we expected a 1:1 match between <code>transactions</code> and <code>companies</code>. We saw that <code>multiple</code> can help protect us from having too many matches, but what about not having enough? Consider what happens if we add a new company to <code>transactions</code> without a corresponding match in <code>companies</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>&lt;-</span> <span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'>tibble</span><span class='nf'>::</span><span class='nf'><a href='https://tibble.tidyverse.org/reference/add_row.html'>add_row</a></span><span class='o'>(</span>company <span class='o'>=</span> <span class='s'>"C"</span>, year <span class='o'>=</span> <span class='m'>2023</span>, revenue <span class='o'>=</span> <span class='m'>15</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>transactions</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; company year revenue</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> C <span style='text-decoration: underline;'>2</span>023 15</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span></span> <span> <span class='nv'>companies</span>, </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nf'>closest</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit</span></span> <span></span></code></pre> </div> <p>We&rsquo;ve accidentally lost the <code>C</code> row! If you don&rsquo;t expect any unmatched rows, you can now catch this problem automatically by using our other new quality control argument, <code>unmatched</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>inner_join</a></span><span class='o'>(</span></span> <span> <span class='nv'>companies</span>, </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nf'>closest</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> unmatched <span class='o'>=</span> <span class='s'>"error"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `inner_join()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Each row of `x` must have a match in `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 5 of `x` does not have a match.</span></span> <span></span></code></pre> </div> <p>If you&rsquo;ve been questioning why I&rsquo;ve been using an <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>inner_join()</code></a> over a <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>left_join()</code></a> this whole time, <code>unmatched</code> is why. We could use a <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>left_join()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>transactions</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span></span> <span> <span class='nv'>companies</span>, </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>company</span> <span class='o'>==</span> <span class='nv'>id</span>, <span class='nf'>closest</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>&gt;=</span> <span class='nv'>since</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> unmatched <span class='o'>=</span> <span class='s'>"error"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 5</span></span></span> <span><span class='c'>#&gt; company year revenue since name </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A <span style='text-decoration: underline;'>2</span>019 50 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A <span style='text-decoration: underline;'>2</span>020 4 <span style='text-decoration: underline;'>1</span>973 Patagonia</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> B <span style='text-decoration: underline;'>2</span>021 10 <span style='text-decoration: underline;'>2</span>009 RStudio </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B <span style='text-decoration: underline;'>2</span>023 12 <span style='text-decoration: underline;'>2</span>022 Posit </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> C <span style='text-decoration: underline;'>2</span>023 15 <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span></span></code></pre> </div> <p>But you&rsquo;ll notice that we don&rsquo;t get an error here. <code>unmatched</code> will only error if the input that has the potential to drop rows has an unmatched row. The reason you&rsquo;d use a <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>left_join()</code></a> is to ensure that rows from <code>x</code> are always retained, so it wouldn&rsquo;t make sense to error when rows from <code>x</code> are also unmatched. If <code>y</code> had unmatched rows instead, <em>then</em> it would have errored because those rows would otherwise be lost from the join. In an <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>inner_join()</code></a>, both inputs can potentially drop rows, so <code>unmatched = &quot;error&quot;</code> checks for unmatched rows in both inputs.</p> forcats 1.0.0 https://www.tidyverse.org/blog/2023/01/forcats-1-0-0/ Mon, 30 Jan 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/01/forcats-1-0-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re so happy to announce the release of <a href="https://forcats.tidyverse.org" target="_blank" rel="noopener">forcats</a> 1.0.0. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"forcats"</span><span class='o'>)</span></span></code></pre> </div> <p>While this is the 1.0.0 release of forcats, this version number is mainly to signal that we think forcats is stable, and that we don&rsquo;t anticipate any major changes in the future. This blog post will outline the only major new feature in this version: better tools for dealing with the two ways that missing values can be represented in factors. As usual, you can see a full list of changes in the <a href="https://github.com/tidyverse/forcats/releases/tag/v1.0.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://forcats.tidyverse.org/'>forcats</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="na-in-levels-vs-na-in-values"><code>NA</code> in levels vs <code>NA</code> in values <a href="#na-in-levels-vs-na-in-values"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are two ways to represent a missing value in a factor:</p> <ul> <li> <p>You can include it in the values of the factor; it does not appear in the levels and <a href="https://rdrr.io/r/base/NA.html" target="_blank" rel="noopener"><code>is.na()</code></a> reports it as missing. This is how missing values are encoded by default:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>f1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span>, <span class='kc'>NA</span>, <span class='kc'>NA</span>, <span class='s'>"x"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/levels.html'>levels</a></span><span class='o'>(</span><span class='nv'>f1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "x" "y"</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>f1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] FALSE FALSE TRUE TRUE FALSE</span></span> <span></span></code></pre> </div> </li> <li> <p>You can include it in the levels of the factor, thus <a href="https://rdrr.io/r/base/NA.html" target="_blank" rel="noopener"><code>is.na()</code></a> does not report it as missing. This requires a little more work to create, because, by default, <a href="https://rdrr.io/r/base/factor.html" target="_blank" rel="noopener"><code>factor()</code></a> uses <code>exclude = NA</code>, meaning that missing values are not included in the levels. You can force <code>NA</code> to be included by setting <code>exclude = NULL</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>f2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span>, <span class='kc'>NA</span>, <span class='kc'>NA</span>, <span class='s'>"x"</span><span class='o'>)</span>, exclude <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/levels.html'>levels</a></span><span class='o'>(</span><span class='nv'>f2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "x" "y" NA</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>f2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] FALSE FALSE FALSE FALSE FALSE</span></span> <span></span></code></pre> </div> </li> </ul> <p>You can see the difference a little more clearly by looking at the underlying integer values of the factor:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/integer.html'>as.integer</a></span><span class='o'>(</span><span class='nv'>f1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 NA NA 1</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/integer.html'>as.integer</a></span><span class='o'>(</span><span class='nv'>f2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 3 3 1</span></span> <span></span></code></pre> </div> <p>When the <code>NA</code> is stored in the levels, there&rsquo;s no missing value in the underlying integer values, because the value of level 3 is <code>NA</code>.</p> <p><code>NA</code>s in the values tend to be best for data analysis, because <a href="https://rdrr.io/r/base/NA.html" target="_blank" rel="noopener"><code>is.na()</code></a> works as you&rsquo;d expect. <code>NA</code>s in the levels are useful if you need to control where missing values are shown in a table or a plot. To make it easier to switch between these forms, forcats now comes <a href="https://forcats.tidyverse.org/reference/fct_na_value_to_level.html" target="_blank" rel="noopener"><code>fct_na_value_to_level()</code></a> and <a href="https://forcats.tidyverse.org/reference/fct_na_value_to_level.html" target="_blank" rel="noopener"><code>fct_na_level_to_value()</code></a>.</p> <p>Here&rsquo;s a practical example of why it matters. In the plot below, I&rsquo;ve attempted to use <a href="https://forcats.tidyverse.org/reference/fct_inorder.html" target="_blank" rel="noopener"><code>fct_infreq()</code></a> to reorder the levels of the factor so that the highest frequency levels are at the top of the bar chart:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>starwars</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_rev.html'>fct_rev</a></span><span class='o'>(</span><span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_inorder.html'>fct_infreq</a></span><span class='o'>(</span><span class='nv'>hair_color</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"Hair color"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/fct-infreq-hair-1.png" alt="The bar chart of hair color, now ordered so that the least frequent colours come first and the most frequent colors come last. This makes it easy to see that the most common hair color is none (~35), followed by brown (~18), then black (~12). Surprisingly, NAs are at the top of the graph, even though there are ~5 NAs and other colors have smaller values." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Unfortunately, however, because the <code>NA</code>s are stored in the values, <a href="https://forcats.tidyverse.org/reference/fct_inorder.html" target="_blank" rel="noopener"><code>fct_infreq()</code></a> has no ability to affect them, so they appear in their default position, after all the other values (it might not be obvious that that they&rsquo;re after the other values here, but remember in plots y values have their smallest values at the bottom and highest values at the top).</p> <p>We can make <a href="https://forcats.tidyverse.org/reference/fct_inorder.html" target="_blank" rel="noopener"><code>fct_infreq()</code></a> do what we want by moving the <code>NA</code> from the values to the levels:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>starwars</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_rev.html'>fct_rev</a></span><span class='o'>(</span><span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_inorder.html'>fct_infreq</a></span><span class='o'>(</span><span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_na_value_to_level.html'>fct_na_value_to_level</a></span><span class='o'>(</span><span class='nv'>hair_color</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"Hair color"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" alt="The bar chart of hair color, now ordered so that NAs are ordered where you'd expect: in between white (4) and black (12)." width="700px" style="display: block; margin: auto;" /></p> </div> <p>That code is getting a little verbose so lets pull it out into a separate dplyr step and pull the factor transformation in to its own mini pipeline:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>starwars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> hair_color <span class='o'>=</span> <span class='nv'>hair_color</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_na_value_to_level.html'>fct_na_value_to_level</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_inorder.html'>fct_infreq</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_rev.html'>fct_rev</a></span><span class='o'>(</span><span class='o'>)</span></span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>hair_color</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"Hair color"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This structure makes it easier to make other adjustments. For example, the code below uses a more informative label for the missing level and lumps together the colours with less than 2 observations. I&rsquo;ve left the (Other) category as a bar at the end, but if I wanted to I could cause it to sort in frequency order by flipping the order of <a href="https://forcats.tidyverse.org/reference/fct_inorder.html" target="_blank" rel="noopener"><code>fct_infreq()</code></a> and <a href="https://forcats.tidyverse.org/reference/fct_lump.html" target="_blank" rel="noopener"><code>fct_lump_min()</code></a> .</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>starwars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> hair_color <span class='o'>=</span> <span class='nv'>hair_color</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_na_value_to_level.html'>fct_na_value_to_level</a></span><span class='o'>(</span><span class='s'>"(Unknown)"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_inorder.html'>fct_infreq</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_lump.html'>fct_lump_min</a></span><span class='o'>(</span><span class='m'>2</span>, other_level <span class='o'>=</span> <span class='s'>"(Other)"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_rev.html'>fct_rev</a></span><span class='o'>(</span><span class='o'>)</span> </span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>hair_color</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"Hair color"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" alt="The bar chart of hair color, with NA hair colour now labelled as (Unknown) and the low frequency bars lumped into (Other)." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Looking closely at what got lumped together made me realise that there&rsquo;s an existing &ldquo;Unknown&rdquo; level that should probably be represented as a missing value. One way to fix that is with <a href="https://forcats.tidyverse.org/reference/fct_na_value_to_level.html" target="_blank" rel="noopener"><code>fct_na_level_to_value()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>starwars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span></span> <span> hair_color <span class='o'>=</span> <span class='nv'>hair_color</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_na_value_to_level.html'>fct_na_level_to_value</a></span><span class='o'>(</span><span class='s'>"Unknown"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_na_value_to_level.html'>fct_na_value_to_level</a></span><span class='o'>(</span><span class='s'>"(Unknown)"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_inorder.html'>fct_infreq</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_lump.html'>fct_lump_min</a></span><span class='o'>(</span><span class='m'>2</span>, other_level <span class='o'>=</span> <span class='s'>"(Other)"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://forcats.tidyverse.org/reference/fct_rev.html'>fct_rev</a></span><span class='o'>(</span><span class='o'>)</span> </span> <span> <span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>hair_color</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> </span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='s'>"Hair color"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" alt="The bar chart of hair color, with &quot;unknown&quot; hair colour now lumped in with (Unknown) instead of other" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> tidyr 1.3.0 https://www.tidyverse.org/blog/2023/01/tidyr-1-3-0/ Tue, 24 Jan 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/01/tidyr-1-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re pleased to announce the release of <a href="https://tidyr.tidyverse.org" target="_blank" rel="noopener">tidyr</a> 1.3.0. tidyr provides a set of tools for transforming data frames to and from tidy data, where each variable is a column and each observation is a row. Tidy data is a convention for matching the semantics and structure of your data that makes using the rest of the tidyverse (and many other R packages) much easier.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidyr"</span><span class='o'>)</span></span></code></pre> </div> <p>This post highlights the biggest changes in this release:</p> <ul> <li> <p>A new family of <code>separate_*()</code> functions supersede <a href="https://tidyr.tidyverse.org/reference/separate.html" target="_blank" rel="noopener"><code>separate()</code></a> and <a href="https://tidyr.tidyverse.org/reference/extract.html" target="_blank" rel="noopener"><code>extract()</code></a> and come with useful debugging features.</p> </li> <li> <p> <a href="https://tidyr.tidyverse.org/reference/unnest_wider.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a> and <a href="https://tidyr.tidyverse.org/reference/unnest_longer.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a> gain a bundle of useful improvements.</p> </li> <li> <p> <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> gets a new <code>cols_vary</code> argument.</p> </li> <li> <p><code>nest(.by)</code> provides a new (and hopefully final) way to create nested datasets.</p> </li> </ul> <p>You should also notice generally improved errors with this release: we check function arguments more aggressively, and take care to always report the name of the function that you called, not some internal helper. As usual, you can find a full set of changes in the <a href="http://github.com/tidyverse/tidyr/releases/tag/v1.3.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span></code></pre> </div> <h2 id="separate_-family-of-functions"><code>separate_*()</code> family of functions <a href="#separate_-family-of-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The biggest feature of this release is a new, experimental family of functions for separating string columns:</p> <table> <thead> <tr> <th></th> <th>Make columns</th> <th>Make rows</th> </tr> </thead> <tbody> <tr> <td>Separate with delimiter</td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a></td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_longer_delim.html" target="_blank" rel="noopener"><code>separate_longer_delim()</code></a></td> </tr> <tr> <td>Separate by position</td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_position()</code></a></td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_longer_delim.html" target="_blank" rel="noopener"><code>separate_longer_position()</code></a></td> </tr> <tr> <td>Separate with regular expression</td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_regex()</code></a></td> <td></td> </tr> </tbody> </table> <p>These functions collectively supersede <a href="https://tidyr.tidyverse.org/reference/extract.html" target="_blank" rel="noopener"><code>extract()</code></a>, <a href="https://tidyr.tidyverse.org/reference/separate.html" target="_blank" rel="noopener"><code>separate()</code></a>, and <a href="https://tidyr.tidyverse.org/reference/separate_rows.html" target="_blank" rel="noopener"><code>separate_rows()</code></a> because they have more consistent names and arguments, have better performance (thanks to stringr), and provide a new approach for handling problems.</p> <table> <thead> <tr> <th></th> <th>Make columns</th> <th>Make rows</th> </tr> </thead> <tbody> <tr> <td>Separate with delimiter</td> <td><code>separate(sep = string)</code></td> <td> <a href="https://tidyr.tidyverse.org/reference/separate_rows.html" target="_blank" rel="noopener"><code>separate_rows()</code></a></td> </tr> <tr> <td>Separate by position</td> <td><code>separate(sep = integer vector)</code></td> <td>N/A</td> </tr> <tr> <td>Separate with regular expression</td> <td> <a href="https://tidyr.tidyverse.org/reference/extract.html" target="_blank" rel="noopener"><code>extract()</code></a></td> <td></td> </tr> </tbody> </table> <p>Here I&rsquo;ll focus on the <code>wider</code> functions because they generally present the most interesting challenges. Let&rsquo;s start by grabbing some census data with the <a href="https://walker-data.com/tidycensus/" target="_blank" rel="noopener">tidycensus</a> package:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vt_census</span> <span class='o'>&lt;-</span> <span class='nf'>tidycensus</span><span class='nf'>::</span><span class='nf'><a href='https://walker-data.com/tidycensus/reference/get_decennial.html'>get_decennial</a></span><span class='o'>(</span></span> <span> geography <span class='o'>=</span> <span class='s'>"block"</span>,</span> <span> state <span class='o'>=</span> <span class='s'>"VT"</span>,</span> <span> county <span class='o'>=</span> <span class='s'>"Washington"</span>,</span> <span> variables <span class='o'>=</span> <span class='s'>"P1_001N"</span>,</span> <span> year <span class='o'>=</span> <span class='m'>2020</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; Getting data from the 2020 decennial Census</span></span> <span></span><span><span class='c'>#&gt; Using the PL 94-171 Redistricting Data summary file</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BBBB;'>Note: 2020 decennial Census data use differential privacy, a technique that</span></span></span> <span><span class='c'><span style='color: #00BBBB;'>#&gt; introduces errors into data to preserve respondent confidentiality.</span></span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> <span style='color: #BB00BB;'>Small counts should be interpreted with caution.</span></span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> <span style='color: #BB00BB;'>See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>This message is displayed once per session.</span></span></span> <span></span><span><span class='nv'>vt_census</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,150 × 4</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>GEOID</span> <span style='font-weight: bold;'>NAME</span> <span style='font-weight: bold;'>variable</span> <span style='font-weight: bold;'>value</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 500239555021014 Block 1014, Block Group 1, Census Tract 9555.… P1_001N 21</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 500239555021015 Block 1015, Block Group 1, Census Tract 9555.… P1_001N 19</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 500239555021016 Block 1016, Block Group 1, Census Tract 9555.… P1_001N 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 500239555021017 Block 1017, Block Group 1, Census Tract 9555.… P1_001N 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 500239555021018 Block 1018, Block Group 1, Census Tract 9555.… P1_001N 43</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 500239555021019 Block 1019, Block Group 1, Census Tract 9555.… P1_001N 68</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 500239555021020 Block 1020, Block Group 1, Census Tract 9555.… P1_001N 30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 500239555021021 Block 1021, Block Group 1, Census Tract 9555.… P1_001N 0</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 500239555021022 Block 1022, Block Group 1, Census Tract 9555.… P1_001N 18</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 500239555021023 Block 1023, Block Group 1, Census Tract 9555.… P1_001N 93</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,140 more rows</span></span></span> <span></span></code></pre> </div> <p>The <code>GEOID</code> column is made up of four components: a 2-digit state identifier, a 3-digit county identifier, a 6-digit tract identifier, and a 4-digit block identifier. We can use <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_position()</code></a> to extract these into their own variables:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vt_census</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>GEOID</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_position</a></span><span class='o'>(</span></span> <span> <span class='nv'>GEOID</span>,</span> <span> widths <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>state <span class='o'>=</span> <span class='m'>2</span>, county <span class='o'>=</span> <span class='m'>3</span>, tract <span class='o'>=</span> <span class='m'>6</span>, block <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,150 × 4</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>state</span> <span style='font-weight: bold;'>county</span> <span style='font-weight: bold;'>tract</span> <span style='font-weight: bold;'>block</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 50 023 955502 1014 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 50 023 955502 1015 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 50 023 955502 1016 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 50 023 955502 1017 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 50 023 955502 1018 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 50 023 955502 1019 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 50 023 955502 1020 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 50 023 955502 1021 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 50 023 955502 1022 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 50 023 955502 1023 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,140 more rows</span></span></span> <span></span></code></pre> </div> <p>The <code>name</code> column contains this same information in a text form, with each component separated by a comma. We can use <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a> to break up this sort of data into individual variables:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vt_census</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>NAME</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_delim</a></span><span class='o'>(</span></span> <span> <span class='nv'>NAME</span>,</span> <span> delim <span class='o'>=</span> <span class='s'>", "</span>,</span> <span> names <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"block"</span>, <span class='s'>"block_group"</span>, <span class='s'>"tract"</span>, <span class='s'>"county"</span>, <span class='s'>"state"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,150 × 5</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>block</span> <span style='font-weight: bold;'>block_group</span> <span style='font-weight: bold;'>tract</span> <span style='font-weight: bold;'>county</span> <span style='font-weight: bold;'>state</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Block 1014 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Block 1015 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Block 1016 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Block 1017 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Block 1018 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Block 1019 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Block 1020 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Block 1021 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Block 1022 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Block 1023 Block Group 1 Census Tract 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,140 more rows</span></span></span> <span></span></code></pre> </div> <p>You&rsquo;ll notice that each row contains a lot of duplicated information (&ldquo;Block&rdquo;, &ldquo;Block Group&rdquo;, &hellip;). You could certainly use <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> and string manipulation to clean this up, but there&rsquo;s a more direct approach that you can use if you&rsquo;re familiar with regular expressions. The new <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_regex()</code></a> takes a vector of regular expressions that are matched in order, from left to right. If you name the regular expression, it will appear in the output; otherwise, it will be dropped. I think this leads to a particularly elegant solution to many problems.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vt_census</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>NAME</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_regex</a></span><span class='o'>(</span></span> <span> <span class='nv'>NAME</span>,</span> <span> patterns <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span></span> <span> <span class='s'>"Block "</span>, block <span class='o'>=</span> <span class='s'>"\\d+"</span>, <span class='s'>", "</span>,</span> <span> <span class='s'>"Block Group "</span>, block_group <span class='o'>=</span> <span class='s'>"\\d+"</span>, <span class='s'>", "</span>,</span> <span> <span class='s'>"Census Tract "</span>, tract <span class='o'>=</span> <span class='s'>"\\d+.\\d+"</span>, <span class='s'>", "</span>,</span> <span> county <span class='o'>=</span> <span class='s'>"[^,]+"</span>, <span class='s'>", "</span>,</span> <span> state <span class='o'>=</span> <span class='s'>".*"</span></span> <span> <span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,150 × 5</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>block</span> <span style='font-weight: bold;'>block_group</span> <span style='font-weight: bold;'>tract</span> <span style='font-weight: bold;'>county</span> <span style='font-weight: bold;'>state</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 1014 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1015 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1016 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1017 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 1018 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 1019 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 1020 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 1021 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 1022 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 1023 1 9555.02 Washington County Vermont</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,140 more rows</span></span></span> <span></span></code></pre> </div> <p>These functions also have a new way to report problems. Let&rsquo;s start with a very simple example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"a-b"</span>, <span class='s'>"a-b-c"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_delim</a></span><span class='o'>(</span><span class='nv'>x</span>, delim <span class='o'>=</span> <span class='s'>"-"</span>, names <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `separate_wider_delim()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Expected 2 pieces in each element of `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> 1 value was too short.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `too_few = "debug"` to diagnose the problem.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `too_few = "align_start"/"align_end"` to silence this message.</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> 1 value was too long.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `too_many = "debug"` to diagnose the problem.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Use `too_many = "drop"/"merge"` to silence this message.</span></span> <span></span></code></pre> </div> <p>We&rsquo;ve requested two columns in the output (<code>x</code> and <code>y</code>), but the first row has only one element and the last row has three elements, so <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a> can&rsquo;t do what we&rsquo;ve asked. The error lays out your options for resolving the problem using the <code>too_few</code> and <code>too_many</code> arguments. I&rsquo;d recommend always starting with <code>&quot;debug&quot;</code> to get more information about the problem:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>probs</span> <span class='o'>&lt;-</span> <span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_delim</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span>,</span> <span> delim <span class='o'>=</span> <span class='s'>"-"</span>,</span> <span> names <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span>,</span> <span> too_few <span class='o'>=</span> <span class='s'>"debug"</span>,</span> <span> too_many <span class='o'>=</span> <span class='s'>"debug"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Debug mode activated: adding variables `x_ok`, `x_pieces`, and `x_remainder`.</span></span> <span></span><span><span class='nv'>probs</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 7</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>a</span> <span style='font-weight: bold;'>b</span> <span style='font-weight: bold;'>x</span> <span style='font-weight: bold;'>x_ok</span> <span style='font-weight: bold;'>x_pieces</span> <span style='font-weight: bold;'>x_remainder</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;lgl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a <span style='color: #BB0000;'>NA</span> a FALSE 1 <span style='color: #555555;'>""</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 a b a-b TRUE 2 <span style='color: #555555;'>""</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 a b a-b-c FALSE 3 <span style='color: #555555;'>"</span>-c<span style='color: #555555;'>"</span></span></span> <span></span></code></pre> </div> <p>This adds three new variables: <code>x_ok</code> tells you if the <code>x</code> could be separated as you requested, <code>x_pieces</code> tells you the actual number of pieces, and <code>x_remainder</code> shows you anything that remains after the columns you asked for. You can use this information to fix the problems in the input, or you can use the other options to <code>too_few</code> and <code>too_many</code> to tell <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_delim()</code></a> to fix them for you:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_wider_delim.html'>separate_wider_delim</a></span><span class='o'>(</span></span> <span> <span class='nv'>x</span>,</span> <span> delim <span class='o'>=</span> <span class='s'>"-"</span>,</span> <span> names <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span>,</span> <span> too_few <span class='o'>=</span> <span class='s'>"align_start"</span>,</span> <span> too_many <span class='o'>=</span> <span class='s'>"drop"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>a</span> <span style='font-weight: bold;'>b</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 a b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 a b</span></span> <span></span></code></pre> </div> <p><code>too_few</code> and <code>too_many</code> also work with <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_position()</code></a>, and <code>too_few</code> works with <a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html" target="_blank" rel="noopener"><code>separate_wider_regex()</code></a>. The <code>longer</code> variants don&rsquo;t need these arguments because varying numbers of rows don&rsquo;t matter in the same way that varying numbers of columns do:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate_longer_delim.html'>separate_longer_delim</a></span><span class='o'>(</span><span class='nv'>x</span>, delim <span class='o'>=</span> <span class='s'>"-"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>x</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 3 c</span></span> <span></span></code></pre> </div> <p>These functions are still experimental so we are actively seeking feedback. Please try them out and let us know if you find them useful or if there are other features you&rsquo;d like to see.</p> <h2 id="unnest_wider-and-unnest_longer-improvements"><code>unnest_wider()</code> and <code>unnest_longer()</code> improvements <a href="#unnest_wider-and-unnest_longer-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://tidyr.tidyverse.org/reference/unnest_longer.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a> and <a href="https://tidyr.tidyverse.org/reference/unnest_wider.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a> have both received some quality of life and consistency improvements. Most importantly:</p> <ul> <li> <p> <a href="https://tidyr.tidyverse.org/reference/unnest_wider.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a> now gives a better error when unnesting an unnamed vector:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"d"</span>, <span class='s'>"e"</span>, <span class='s'>"f"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/unnest_wider.html'>unnest_wider</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `unnest_wider()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In column: `x`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In row: 1.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't unnest elements with missing names.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Supply `names_sep` to generate automatic names.</span></span> <span></span><span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/unnest_wider.html'>unnest_wider</a></span><span class='o'>(</span><span class='nv'>x</span>, names_sep <span class='o'>=</span> <span class='s'>"_"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 4</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>x_1</span> <span style='font-weight: bold;'>x_2</span> <span style='font-weight: bold;'>x_3</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a b <span style='color: #BB0000;'>NA</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 d e f</span></span> <span></span></code></pre> </div> <p>And this same behaviour now also applies to partially named vectors.</p> </li> <li> <p> <a href="https://tidyr.tidyverse.org/reference/unnest_longer.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a> has gained a <code>keep_empty</code> argument like <a href="https://tidyr.tidyverse.org/reference/unnest.html" target="_blank" rel="noopener"><code>unnest()</code></a>, and it now treats <code>NULL</code> and empty vectors the same way:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='kc'>NULL</span>, <span class='nf'><a href='https://rdrr.io/r/base/integer.html'>integer</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>3</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/unnest_longer.html'>unnest_longer</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>x</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 3 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 3 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 3</span></span> <span></span><span><span class='nv'>df</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/unnest_longer.html'>unnest_longer</a></span><span class='o'>(</span><span class='nv'>x</span>, keep_empty <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>id</span> <span style='font-weight: bold;'>x</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 3</span></span> <span></span></code></pre> </div> </li> </ul> <h2 id="pivot_longercols_vary"><code>pivot_longer(cols_vary)</code> <a href="#pivot_longercols_vary"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By default, <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> creates its output row-by-row:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> x <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>,</span> <span> y <span class='o'>=</span> <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span>,</span> <span> z <span class='o'>=</span> <span class='m'>5</span><span class='o'>:</span><span class='m'>6</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>,</span> <span> names_to <span class='o'>=</span> <span class='s'>"name"</span>,</span> <span> values_to <span class='o'>=</span> <span class='s'>"value"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>name</span> <span style='font-weight: bold;'>value</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> x 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> y 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> z 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> x 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> y 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> z 6</span></span> <span></span></code></pre> </div> <p>You can now request to create the output column-by-column with <code>cols_vary = &quot;slowest&quot;:</code></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_longer.html'>pivot_longer</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://tidyselect.r-lib.org/reference/everything.html'>everything</a></span><span class='o'>(</span><span class='o'>)</span>,</span> <span> names_to <span class='o'>=</span> <span class='s'>"name"</span>,</span> <span> values_to <span class='o'>=</span> <span class='s'>"value"</span>,</span> <span> cols_vary <span class='o'>=</span> <span class='s'>"slowest"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>name</span> <span style='font-weight: bold;'>value</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> x 1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> x 2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> y 3</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> y 4</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> z 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> z 6</span></span> <span></span></code></pre> </div> <h2 id="nestby"><code>nest(.by)</code> <a href="#nestby"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A nested data frame is a data frame where one (or more) columns is a list of data frames. Nested data frames are a powerful tool that allow you to turn groups into rows and can facilitate certain types of data manipulation that would be very tricky otherwise. (One place to learn more about them is my 2016 talk &ldquo; <a href="https://www.youtube.com/watch?v=rz3_FDVt9eg" target="_blank" rel="noopener">Managing many models with R</a>&quot;.)</p> <p>Over the years we&rsquo;ve made a number of attempts at getting the correct interface for nesting, including <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>tidyr::nest()</code></a>, <a href="https://dplyr.tidyverse.org/reference/nest_by.html" target="_blank" rel="noopener"><code>dplyr::nest_by()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/group_nest.html" target="_blank" rel="noopener"><code>dplyr::group_nest()</code></a>. In this version of tidyr we&rsquo;ve taken one more stab at it by adding a new argument to <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>nest()</code></a>: <code>.by</code>, inspired by the upcoming <a href="https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/" target="_blank" rel="noopener">dplyr 1.1.0</a> release. This means that <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>nest()</code></a> now allows you to specify the variables you want to nest by as an alternative to specifying the variables that appear in the nested data.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Specify what to nest by</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>.by <span class='o'>=</span> <span class='nv'>cyl</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>cyl</span> <span style='font-weight: bold;'>data</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 6 <span style='color: #555555;'>&lt;tibble [7 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 4 <span style='color: #555555;'>&lt;tibble [11 × 10]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 8 <span style='color: #555555;'>&lt;tibble [14 × 10]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># Specify what should be nested</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='o'>-</span><span class='nv'>cyl</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>cyl</span> <span style='font-weight: bold;'>data</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 6 <span style='color: #555555;'>&lt;tibble [7 × 10]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 4 <span style='color: #555555;'>&lt;tibble [11 × 10]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 8 <span style='color: #555555;'>&lt;tibble [14 × 10]&gt;</span></span></span> <span></span><span></span> <span><span class='c'># Specify both (to drop variables)</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>nest</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>mpg</span><span class='o'>:</span><span class='nv'>drat</span>, .by <span class='o'>=</span> <span class='nv'>cyl</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>cyl</span> <span style='font-weight: bold;'>data</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 6 <span style='color: #555555;'>&lt;tibble [7 × 5]&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 4 <span style='color: #555555;'>&lt;tibble [11 × 5]&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 8 <span style='color: #555555;'>&lt;tibble [14 × 5]&gt;</span></span></span> <span></span></code></pre> </div> <p>If this function is all we hope it to be, we&rsquo;re likely to supersede <a href="https://dplyr.tidyverse.org/reference/nest_by.html" target="_blank" rel="noopener"><code>dplyr::nest_by()</code></a> and <a href="https://dplyr.tidyverse.org/reference/group_nest.html" target="_blank" rel="noopener"><code>dplyr::group_nest()</code></a> in the future. This has the nice property of placing the functions for nesting and unnesting in the same package (tidyr).</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 51 contributors who helped make this release possible, by writing code and documentating, asking questions, and reporting bugs! <a href="https://github.com/AdrianS85" target="_blank" rel="noopener">@AdrianS85</a>, <a href="https://github.com/ahcyip" target="_blank" rel="noopener">@ahcyip</a>, <a href="https://github.com/allenbaron" target="_blank" rel="noopener">@allenbaron</a>, <a href="https://github.com/AnBarbosaBr" target="_blank" rel="noopener">@AnBarbosaBr</a>, <a href="https://github.com/ArthurAndrews" target="_blank" rel="noopener">@ArthurAndrews</a>, <a href="https://github.com/bart1" target="_blank" rel="noopener">@bart1</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/bknakker" target="_blank" rel="noopener">@bknakker</a>, <a href="https://github.com/bwiernik" target="_blank" rel="noopener">@bwiernik</a>, <a href="https://github.com/crissthiandi" target="_blank" rel="noopener">@crissthiandi</a>, <a href="https://github.com/daattali" target="_blank" rel="noopener">@daattali</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dcaud" target="_blank" rel="noopener">@dcaud</a>, <a href="https://github.com/DSLituiev" target="_blank" rel="noopener">@DSLituiev</a>, <a href="https://github.com/elgabbas" target="_blank" rel="noopener">@elgabbas</a>, <a href="https://github.com/fabiangehring" target="_blank" rel="noopener">@fabiangehring</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/ilikegitlab" target="_blank" rel="noopener">@ilikegitlab</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jic007" target="_blank" rel="noopener">@jic007</a>, <a href="https://github.com/Joao-O-Santos" target="_blank" rel="noopener">@Joao-O-Santos</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/jonspring" target="_blank" rel="noopener">@jonspring</a>, <a href="https://github.com/kevinushey" target="_blank" rel="noopener">@kevinushey</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lotard" target="_blank" rel="noopener">@lotard</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, <a href="https://github.com/lucylgao" target="_blank" rel="noopener">@lucylgao</a>, <a href="https://github.com/markfairbanks" target="_blank" rel="noopener">@markfairbanks</a>, <a href="https://github.com/martina-starc" target="_blank" rel="noopener">@martina-starc</a>, <a href="https://github.com/MatthieuStigler" target="_blank" rel="noopener">@MatthieuStigler</a>, <a href="https://github.com/mattnolan001" target="_blank" rel="noopener">@mattnolan001</a>, <a href="https://github.com/mattroumaya" target="_blank" rel="noopener">@mattroumaya</a>, <a href="https://github.com/mdkrause" target="_blank" rel="noopener">@mdkrause</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/millermc38" target="_blank" rel="noopener">@millermc38</a>, <a href="https://github.com/modche" target="_blank" rel="noopener">@modche</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/mspittler" target="_blank" rel="noopener">@mspittler</a>, <a href="https://github.com/olivroy" target="_blank" rel="noopener">@olivroy</a>, <a href="https://github.com/piokol23" target="_blank" rel="noopener">@piokol23</a>, <a href="https://github.com/ppreshant" target="_blank" rel="noopener">@ppreshant</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/Rengervn" target="_blank" rel="noopener">@Rengervn</a>, <a href="https://github.com/rjake" target="_blank" rel="noopener">@rjake</a>, <a href="https://github.com/roohitk" target="_blank" rel="noopener">@roohitk</a>, <a href="https://github.com/struckma" target="_blank" rel="noopener">@struckma</a>, <a href="https://github.com/tjmahr" target="_blank" rel="noopener">@tjmahr</a>, <a href="https://github.com/weirichs" target="_blank" rel="noopener">@weirichs</a>, and <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>.</p> dbplyr 2.3.0 https://www.tidyverse.org/blog/2023/01/dbplyr-2-3-0/ Mon, 16 Jan 2023 00:00:00 +0000 https://www.tidyverse.org/blog/2023/01/dbplyr-2-3-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="http://dbplyr.tidyverse.org/" target="_blank" rel="noopener">dbplyr</a> 2.3.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"&#123;package&#125;"</span><span class='o'>)</span></span></code></pre> </div> <p>This post will highlight some of the most important new features in 2.3.0: eliminating subqueries for many verb combinations, better errors, and a handful of new translations. As usual, this release comes with a large number of improvements to translations for individual backends and you can see the full list in the <a href="%7B%20github_release%20%7D">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbplyr.tidyverse.org/'>dbplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span></code></pre> </div> <h2 id="sql-optimisation">SQL optimisation <a href="#sql-optimisation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>dbplyr now produces fewer subqueries resulting in shorter, more readable, and, in some cases, faster SQL. The following combinations of verbs no longer require subqueries:</p> <ul> <li><code>*_join()</code> + <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a> and <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a> + <code>*_join()</code>.</li> <li> <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> + <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a> and <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a> + <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>.</li> <li> <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a>/ <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a>/ <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a> + <a href="https://dplyr.tidyverse.org/reference/distinct.html" target="_blank" rel="noopener"><code>distinct()</code></a>.</li> <li> <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> + <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a> now translates to <code>HAVING</code>.</li> <li><code>left/inner_join()</code> + <code>left/inner_join()</code>.</li> </ul> <p>Here are a couple of examples of queries that are now much more compact:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, a <span class='o'>=</span> <span class='s'>"a"</span>, .name <span class='o'>=</span> <span class='s'>"lf1"</span><span class='o'>)</span></span> <span><span class='nv'>lf2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, b <span class='o'>=</span> <span class='s'>"b"</span>, .name <span class='o'>=</span> <span class='s'>"lf2"</span><span class='o'>)</span></span> <span><span class='nv'>lf3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, c <span class='o'>=</span> <span class='s'>"c"</span>, .name <span class='o'>=</span> <span class='s'>"lf3"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>lf2</span>, by <span class='o'>=</span> <span class='s'>"x"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>lf3</span>, by <span class='o'>=</span> <span class='s'>"x"</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>b</span>, <span class='nv'>c</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `b`, `c`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf2`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`lf1`.`x` = `lf2`.`x`)</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>LEFT JOIN</span> `lf3`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>ON</span> (`lf1`.`x` = `lf3`.`x`)</span></span> <span></span><span></span> <span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>a</span>, na.rm <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span>, n <span class='o'>=</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/context.html'>n</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>n</span> <span class='o'>&gt;</span> <span class='m'>5</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> `x`, AVG(`a`)<span style='color: #0000BB;'> AS </span>`a`, COUNT(*)<span style='color: #0000BB;'> AS </span>`n`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>GROUP BY</span> `x`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>HAVING</span> (COUNT(*) &gt; 5.0)</span></span> <span></span></code></pre> </div> <p>(As ususal in these blog posts, I&rsquo;m using <a href="https://dbplyr.tidyverse.org/reference/tbl_lazy.html" target="_blank" rel="noopener"><code>lazy_frame()</code></a> to focus on the SQL generation, without having to set up a dummy database.)</p> <p>Additionally, where possible, dbplyr now uses <code>SELECT *</code> after a join instead of explicitly selecting every column.</p> <h2 id="improved-errors">Improved errors <a href="#improved-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Variables that aren&rsquo;t found in either the data or in the environment now produce an error:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>,y <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>lf</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>z</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `mutate()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while computing `x = z + 1`</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Object `z` not found.</span></span> <span></span></code></pre> </div> <p>(Previously they were silently translated to SQL variables.)</p> <p>We&rsquo;ve also generally reviewed the error messages to ensure they show more clearly where the error happened:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>y</span> <span class='o'><a href='https://rdrr.io/r/base/Arithmetic.html'>%/%</a></span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `purrr::pmap()` at </span><a href='file:///Users/hadleywickham/Documents/dplyr/dbplyr/R/lazy-select-query.R'><span style='font-weight: bold;'>dbplyr/R/lazy-select-query.R:282:2</span></a><span style='font-weight: bold;'>:</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In index: 1.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> With name: x.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `y %/% 1`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> %/% is not available in this SQL variant</span></span> <span></span><span></span> <span><span class='nv'>lf</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/across.html'>across</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>:</span><span class='nv'>y</span>, <span class='s'>"a"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `mutate()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while computing `..1 = across(x:y, "a")`</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `across()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `.fns` must be a NULL, a function, formula, or list</span></span> <span></span></code></pre> </div> <h2 id="new-translations">New translations <a href="#new-translations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://stringr.tidyverse.org/reference/str_like.html" target="_blank" rel="noopener"><code>stringr::str_like()</code></a> (new in stringr 1.5.0) is translated to <code>LIKE</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>lf1</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nf'>stringr</span><span class='nf'>::</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_like.html'>str_like</a></span><span class='o'>(</span><span class='nv'>a</span>, <span class='s'>"abc"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; &lt;SQL&gt;</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `lf1`</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>WHERE</span> (`a` LIKE 'abc')</span></span> <span></span></code></pre> </div> <p>dbplyr 2.3.0 is also supports features coming in <a href="https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/" target="_blank" rel="noopener">dplyr 1.1.0</a>:</p> <ul> <li>The <code>.by</code> argument is supported as alternative to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>.</li> <li>Passing <code>...</code> to <a href="https://dplyr.tidyverse.org/reference/across.html" target="_blank" rel="noopener"><code>across()</code></a> is deprecated because the evaluation timing of <code>...</code> is ambiguous.</li> <li>New <code>pick()</code> and <code>case_match()</code> functions are translated.</li> <li> <a href="https://dplyr.tidyverse.org/reference/case_when.html" target="_blank" rel="noopener"><code>case_when()</code></a> now supports the <code>.default</code> argument.</li> </ul> <p>This version does not support the new <code>join_by()</code> syntax, but we&rsquo;re working on it, and we&rsquo;ll release an update after dplyr 1.1.0 is out.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The vast majority of this release (particularly the SQL optimisations) are from <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a>; thanks so much for your continued work on this package.</p> <p>We&rsquo;d also like to thank all 74 contributors who help in someway, whether it was filing issues or contributing code and documentation: <a href="https://github.com/a4sberg" target="_blank" rel="noopener">@a4sberg</a>, <a href="https://github.com/ablack3" target="_blank" rel="noopener">@ablack3</a>, <a href="https://github.com/akgold" target="_blank" rel="noopener">@akgold</a>, <a href="https://github.com/aleighbrown" target="_blank" rel="noopener">@aleighbrown</a>, <a href="https://github.com/andreassoteriadesmoj" target="_blank" rel="noopener">@andreassoteriadesmoj</a>, <a href="https://github.com/apalacio9502" target="_blank" rel="noopener">@apalacio9502</a>, <a href="https://github.com/baileych" target="_blank" rel="noopener">@baileych</a>, <a href="https://github.com/barnesparker" target="_blank" rel="noopener">@barnesparker</a>, <a href="https://github.com/bhuvanesh1707" target="_blank" rel="noopener">@bhuvanesh1707</a>, <a href="https://github.com/bkraft4257" target="_blank" rel="noopener">@bkraft4257</a>, <a href="https://github.com/bobbymc0" target="_blank" rel="noopener">@bobbymc0</a>, <a href="https://github.com/brian-law-rstudio" target="_blank" rel="noopener">@brian-law-rstudio</a>, <a href="https://github.com/bthe" target="_blank" rel="noopener">@bthe</a>, <a href="https://github.com/But2ene" target="_blank" rel="noopener">@But2ene</a>, <a href="https://github.com/capitantyler" target="_blank" rel="noopener">@capitantyler</a>, <a href="https://github.com/carlganz" target="_blank" rel="noopener">@carlganz</a>, <a href="https://github.com/cboettig" target="_blank" rel="noopener">@cboettig</a>, <a href="https://github.com/chwpearse" target="_blank" rel="noopener">@chwpearse</a>, <a href="https://github.com/copernican" target="_blank" rel="noopener">@copernican</a>, <a href="https://github.com/DSLituiev" target="_blank" rel="noopener">@DSLituiev</a>, <a href="https://github.com/ehudtr7" target="_blank" rel="noopener">@ehudtr7</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/ejneer" target="_blank" rel="noopener">@ejneer</a>, <a href="https://github.com/eutwt" target="_blank" rel="noopener">@eutwt</a>, <a href="https://github.com/ewright-vcan" target="_blank" rel="noopener">@ewright-vcan</a>, <a href="https://github.com/fabkury" target="_blank" rel="noopener">@fabkury</a>, <a href="https://github.com/fh-afrachioni" target="_blank" rel="noopener">@fh-afrachioni</a>, <a href="https://github.com/fh-mthomson" target="_blank" rel="noopener">@fh-mthomson</a>, <a href="https://github.com/filipemsc" target="_blank" rel="noopener">@filipemsc</a>, <a href="https://github.com/gadenbuie" target="_blank" rel="noopener">@gadenbuie</a>, <a href="https://github.com/gbouzill" target="_blank" rel="noopener">@gbouzill</a>, <a href="https://github.com/giocomai" target="_blank" rel="noopener">@giocomai</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hershelm" target="_blank" rel="noopener">@hershelm</a>, <a href="https://github.com/iangow" target="_blank" rel="noopener">@iangow</a>, <a href="https://github.com/iMissile" target="_blank" rel="noopener">@iMissile</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/j-wester" target="_blank" rel="noopener">@j-wester</a>, <a href="https://github.com/Janlow" target="_blank" rel="noopener">@Janlow</a>, <a href="https://github.com/jasonmhoule" target="_blank" rel="noopener">@jasonmhoule</a>, <a href="https://github.com/jensmassberg" target="_blank" rel="noopener">@jensmassberg</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/joe-rodd" target="_blank" rel="noopener">@joe-rodd</a>, <a href="https://github.com/kongdd" target="_blank" rel="noopener">@kongdd</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lschneiderbauer" target="_blank" rel="noopener">@lschneiderbauer</a>, <a href="https://github.com/machow" target="_blank" rel="noopener">@machow</a>, <a href="https://github.com/mgarbuzov" target="_blank" rel="noopener">@mgarbuzov</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/multimeric" target="_blank" rel="noopener">@multimeric</a>, <a href="https://github.com/namarkus" target="_blank" rel="noopener">@namarkus</a>, <a href="https://github.com/noamross" target="_blank" rel="noopener">@noamross</a>, <a href="https://github.com/NZambranoc" target="_blank" rel="noopener">@NZambranoc</a>, <a href="https://github.com/oriolarques" target="_blank" rel="noopener">@oriolarques</a>, <a href="https://github.com/overmar" target="_blank" rel="noopener">@overmar</a>, <a href="https://github.com/owenjonesuob" target="_blank" rel="noopener">@owenjonesuob</a>, <a href="https://github.com/p-schaefer" target="_blank" rel="noopener">@p-schaefer</a>, <a href="https://github.com/rohitg33" target="_blank" rel="noopener">@rohitg33</a>, <a href="https://github.com/rowrowrowyourboat" target="_blank" rel="noopener">@rowrowrowyourboat</a>, <a href="https://github.com/rsund" target="_blank" rel="noopener">@rsund</a>, <a href="https://github.com/samssann" target="_blank" rel="noopener">@samssann</a>, <a href="https://github.com/samterfa" target="_blank" rel="noopener">@samterfa</a>, <a href="https://github.com/schradj" target="_blank" rel="noopener">@schradj</a>, <a href="https://github.com/scvail195" target="_blank" rel="noopener">@scvail195</a>, <a href="https://github.com/slhck" target="_blank" rel="noopener">@slhck</a>, <a href="https://github.com/splaisan" target="_blank" rel="noopener">@splaisan</a>, <a href="https://github.com/stephenashton-dhsc" target="_blank" rel="noopener">@stephenashton-dhsc</a>, <a href="https://github.com/ThomasMorland" target="_blank" rel="noopener">@ThomasMorland</a>, <a href="https://github.com/thothal" target="_blank" rel="noopener">@thothal</a>, <a href="https://github.com/viswaduttp" target="_blank" rel="noopener">@viswaduttp</a>, <a href="https://github.com/XoliloX" target="_blank" rel="noopener">@XoliloX</a>, and <a href="https://github.com/yuhenghuang" target="_blank" rel="noopener">@yuhenghuang</a>.</p> Q4 2022 tidymodels digest https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/ Thu, 29 Dec 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/12/tidymodels-2022-q4/ <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these posts from the past couple months:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2022/12/tidyclust-0-1-0/" target="_blank" rel="noopener">tidyclust is on CRAN</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/11/model-calibration/" target="_blank" rel="noopener">Model calibration</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/10/parsnip-checking-1-0-2/" target="_blank" rel="noopener">Improvements to model specification checking in tidymodels</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/10/tidymodels-2022-q3/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 9 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>bonsai <a href="https://bonsai.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.1)</a></li> <li>broom <a href="https://broom.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.1)</a></li> <li>dials <a href="https://dials.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.3)</a></li> <li>rsample <a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.1)</a></li> <li>stacks <a href="https://stacks.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>workflows <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.2)</a></li> </ul> </div> <p>We&rsquo;ll highlight a few especially notable changes below: more specialized role selectors in recipes, extended support for grouped resampling in rsample, and a big speedup in parsnip. First, loading the collection of packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="specialized-role-selectors">Specialized role selectors <a href="#specialized-role-selectors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes package for preprocessing</a> supports tidyselect-style variable selection, and includes some of its own selectors to support common modeling workflows.</p> <p>To illustrate, we&rsquo;ll make use of a dataset <code>goofy_data</code> with a number of different variable types:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nv'>goofy_data</span><span class='o'>)</span></span> <span><span class='c'>#&gt; tibble [100 × 10] (S3: tbl_df/tbl/data.frame)</span></span> <span><span class='c'>#&gt; $ class: Factor w/ 2 levels "class_1","class_2": 1 1 2 1 2 1 1 2 2 2 ...</span></span> <span><span class='c'>#&gt; $ a : Factor w/ 7 levels "-3","-2","-1",..: 4 4 3 2 4 5 2 2 3 5 ...</span></span> <span><span class='c'>#&gt; $ b : Factor w/ 9 levels "-4","-3","-2",..: 9 5 4 3 4 7 4 2 3 6 ...</span></span> <span><span class='c'>#&gt; $ c : int [1:100] 0 0 0 0 0 0 0 -1 0 1 ...</span></span> <span><span class='c'>#&gt; $ d : int [1:100] 0 1 1 1 0 1 1 0 0 1 ...</span></span> <span><span class='c'>#&gt; $ e : int [1:100] 1 0 1 0 0 1 1 0 1 1 ...</span></span> <span><span class='c'>#&gt; $ f : num [1:100] 1.01 -1.99 2.18 2.3 -3.01 ...</span></span> <span><span class='c'>#&gt; $ g : num [1:100] -0.845 1.456 1.948 1.354 1.085 ...</span></span> <span><span class='c'>#&gt; $ h : num [1:100] -0.285 0.59 -0.938 1.447 0.424 ...</span></span> <span><span class='c'>#&gt; $ i : chr [1:100] "white" "maroon" "maroon" "maroon" ...</span></span> <span></span></code></pre> </div> <p>Imagine a classification problem on the <code>goofy_data</code> where we&rsquo;d like to predict <code>class</code> using the remaining variables as predictors. The selector functions allow us to perform operations on only the predictors with a certain class. For instance, centering and scaling all numeric predictors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>goofy_data</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Recipe</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Inputs:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; role #variables</span></span> <span><span class='c'>#&gt; outcome 1</span></span> <span><span class='c'>#&gt; predictor 9</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Training data contained 100 data points and no missing data.</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Operations:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Centering and scaling for c, d, e, f, g, h [trained]</span></span> <span></span></code></pre> </div> <p>Or making dummy variables out of each of the nominal predictors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>goofy_data</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Recipe</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Inputs:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; role #variables</span></span> <span><span class='c'>#&gt; outcome 1</span></span> <span><span class='c'>#&gt; predictor 9</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Training data contained 100 data points and no missing data.</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Operations:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Dummy variables from a, b, i [trained]</span></span> <span></span></code></pre> </div> <p>Operations like those above have been long-standing functionality in recipes, and are powerful tools for effective modeling. The most recent release of recipes introduced <a href="https://fosstodon.org/@emilhvitfeldt/109315135944110742" target="_blank" rel="noopener">finer-grain selectors</a> for variable types. For instance, we may want to only center and scale the <em>double</em> (i.e. real-valued) predictors, excluding the integers. With the new release of recipes, we can easily do so:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>, <span class='nv'>goofy_data</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_double_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>%&gt;%</span></span> <span> <span class='nf'>prep</span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Recipe</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Inputs:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; role #variables</span></span> <span><span class='c'>#&gt; outcome 1</span></span> <span><span class='c'>#&gt; predictor 9</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Training data contained 100 data points and no missing data.</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Operations:</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Centering and scaling for f, g, h [trained]</span></span> <span></span></code></pre> </div> <p>This is one of a number of new selectors:</p> <ul> <li> <p>The <code>all_nominal()</code> selector now has finer-grained versions <code>all_string()</code>, <code>all_factor()</code>, <code>all_unordered()</code>, and <code>all_ordered()</code>.</p> </li> <li> <p>The <code>all_numeric()</code> selector now has finer-grained versions <code>all_double()</code>, and <code>all_integer()</code>.</p> </li> <li> <p>New <code>all_logical()</code>, <code>all_date()</code>, and <code>all_datetime()</code> selectors.</p> </li> </ul> <p>All new selectors have <code>*_predictors()</code> variants. You can read more about recipes 1.0.3 in the <a href="https://recipes.tidymodels.org/news/index.html#recipes-103" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="grouped-resampling">Grouped resampling <a href="#grouped-resampling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The most recent release of rsample introduced support for stratification with grouped resampling. Consider the following toy data set on the number of melons in a household:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>melons</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4,928 × 3</span></span></span> <span><span class='c'>#&gt; household n_melons chops</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 1 114 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 1 179 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1 163 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1 35 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 1 93 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 1 55 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 1 165 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 1 30 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 1 140 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 1 7 Yes </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 4,918 more rows</span></span></span> <span></span></code></pre> </div> <p>There are 100 different households in this dataset. Each member of the household has some number of melons <code>n_melons</code> in their fridge. A household, i.e., all its members, either <code>chops</code> their melons or keeps them whole.</p> <p>Each of the resampling functions in rsample have a <code>group_*</code>ed analogue. From rsample&rsquo;s <a href="https://rsample.tidymodels.org/articles/Common_Patterns.html#grouped-resampling" target="_blank" rel="noopener">&ldquo;Common Patterns&rdquo; article</a>:</p> <blockquote> <p>Often, some observations in your data will be &ldquo;more related&rdquo; to each other than would be probable under random chance, for instance because they represent repeated measurements of the same subject or were all collected at a single location. In these situations, you often want to assign all related observations to either the analysis or assessment fold as a group, to avoid having assessment data that's closely related to the data used to fit a model.</p> </blockquote> <p>For example, the grouped <code>initial_split()</code> variant will allot the training and testing set mutually exclusive levels of the <code>group</code> variable:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'>group_initial_split</span><span class='o'>(</span><span class='nv'>melons</span>, group <span class='o'>=</span> <span class='nv'>household</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/unique.html'>unique</a></span><span class='o'>(</span><span class='nf'>training</span><span class='o'>(</span><span class='nv'>resample</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>household</span><span class='o'>)</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/base/unique.html'>unique</a></span><span class='o'>(</span><span class='nf'>testing</span><span class='o'>(</span><span class='nv'>resample</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>household</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0</span></span> <span></span></code></pre> </div> <p>However, note that there are only a few households that don&rsquo;t chop their melons, and those households tend to have many more melons to chop!</p> <div class="highlight"> <p><img src="figs/melon-plot-1.png" alt="A ggplot histogram displaying the mean number of melons per household, filled by whether the household chops their melons or not. The plot shows that there are relatively few households that don't chop their melons, but those households have many more melons to chop. Households that chop their melons have around 80 to chop, while those that don't have around 200." width="700px" style="display: block; margin: auto;" /></p> </div> <p>If we&rsquo;re ultimately interested in modeling whether a household chops their melons, we ought to ensure that both values of <code>chops</code> are well-represented in both the training and testing set. The argument <code>strata = chops</code> indicates that sampling by <code>household</code> will occur within values of <code>chops</code>. Note that the strata must be constant in each group, so here, all members of a household need to either chop or not.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>resample_stratified</span> <span class='o'>&lt;-</span> <span class='nf'>group_initial_split</span><span class='o'>(</span><span class='nv'>melons</span>, group <span class='o'>=</span> <span class='nv'>household</span>, strata <span class='o'>=</span> <span class='nv'>chops</span><span class='o'>)</span></span></code></pre> </div> <p>Note that this resampling scheme still resulted in different <code>household</code>s being allotted to training and testing:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/sum.html'>sum</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/unique.html'>unique</a></span><span class='o'>(</span><span class='nf'>training</span><span class='o'>(</span><span class='nv'>resample_stratified</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>household</span><span class='o'>)</span> <span class='o'><a href='https://rdrr.io/r/base/match.html'>%in%</a></span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/base/unique.html'>unique</a></span><span class='o'>(</span><span class='nf'>testing</span><span class='o'>(</span><span class='nv'>resample_stratified</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>household</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0</span></span> <span></span></code></pre> </div> <p>Also, though, it ensured that similar proportions of <code>chops</code> values are allotted to the training and testing set:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/diff.html'>diff</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nf'>training</span><span class='o'>(</span><span class='nv'>resample_stratified</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>chops</span> <span class='o'>==</span> <span class='s'>"Yes"</span><span class='o'>)</span>,</span> <span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nf'>testing</span><span class='o'>(</span><span class='nv'>resample_stratified</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>chops</span> <span class='o'>==</span> <span class='s'>"Yes"</span><span class='o'>)</span></span> <span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.01000042</span></span> <span></span></code></pre> </div> <p>You can read more about rsample 1.1.1 in the <a href="https://rsample.tidymodels.org/news/index.html#rsample-111" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="performance-speedup">Performance speedup <a href="#performance-speedup"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We recently made a performance tweak, released as part of parsnip 1.0.3, that resulted in a substantial speedup in fit time. Fitting models via parsnip is a fundamental operation in the tidymodels, so the speedup can be observed across many modeling workflows.</p> <p>The figure below demonstrates this speedup in <a href="https://gist.github.com/simonpcouch/651d0ea4d968b455ded8194578dabf52" target="_blank" rel="noopener">an experiment</a> involving fitting a simple linear regression model on resamples of simulated data. Simulated datasets with between one hundred and one million rows were partitioned into five, ten, or twenty folds and fitted with the new version of parsnip as well as the version preceding it. With smaller datasets, the speedup is negligible, but fit times decrease by a factor of three to five once training data reaches one million rows.</p> <div class="highlight"> <p><img src="figs/speedup-1.png" alt="A ggplot line plot displaying the relative speedup between parsnip 1.0.2 and 1.0.3. The number of rows in training data is on the x axis, ranging from one hundred to one million, and the factor of speedup (1.0.2 over 1.0.3) is on the y axis, ranging from 1 to 5. Three lines, colored by 'number of folds,' noting 5, 10, or 20 resamples, stretch from the bottom left to top right of the plot. This shows that, as training data gets larger, the magnitude of speedup with the new parsnip version gets larger and larger." width="100%" style="display: block; margin: auto;" /></p> </div> <p>You can read more about parsnip 1.0.3 in the <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-103" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank those in the community that contributed to tidymodels in the last quarter:</p> <div class="highlight"> <ul> <li>bonsai: <a href="https://github.com/HenrikBengtsson" target="_blank" rel="noopener">@HenrikBengtsson</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>broom: <a href="https://github.com/amorris28" target="_blank" rel="noopener">@amorris28</a>, <a href="https://github.com/capnrefsmmat" target="_blank" rel="noopener">@capnrefsmmat</a>, <a href="https://github.com/larmarange" target="_blank" rel="noopener">@larmarange</a>, <a href="https://github.com/lukepilling" target="_blank" rel="noopener">@lukepilling</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>butcher: <a href="https://github.com/galen-ft" target="_blank" rel="noopener">@galen-ft</a>, and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</li> <li>dials: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/Tadge-Analytics" target="_blank" rel="noopener">@Tadge-Analytics</a>.</li> <li>parsnip: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/fkohrt" target="_blank" rel="noopener">@fkohrt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/Marwolaeth" target="_blank" rel="noopener">@Marwolaeth</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/schoonees" target="_blank" rel="noopener">@schoonees</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/sweiner123" target="_blank" rel="noopener">@sweiner123</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>recipes: <a href="https://github.com/andeek" target="_blank" rel="noopener">@andeek</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/mdancho84" target="_blank" rel="noopener">@mdancho84</a>, and <a href="https://github.com/mobius-eng" target="_blank" rel="noopener">@mobius-eng</a>.</li> <li>rsample: <a href="https://github.com/bschneidr" target="_blank" rel="noopener">@bschneidr</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/pgg1309" target="_blank" rel="noopener">@pgg1309</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>stacks: <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>workflows: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>, and <a href="https://github.com/xiaochi-liu" target="_blank" rel="noopener">@xiaochi-liu</a>.</li> </ul> </div> <p>We&rsquo;re grateful for all of the tidymodels community, from observers to users to contributors, and wish you all a happy new year!</p> purrr 1.0.0 https://www.tidyverse.org/blog/2022/12/purrr-1-0-0/ Tue, 20 Dec 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/12/purrr-1-0-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re happy to announce the release of <a href="http://purrr.tidyverse.org/" target="_blank" rel="noopener">purrr</a> 1.0.0! purrr enhances R&rsquo;s functional programming toolkit by providing a complete and consistent set of tools for working with functions and vectors. In the words of ChatGPT:</p> <blockquote> <p>With purrr, you can easily &ldquo;kitten&rdquo; your functions together to perform complex operations, &ldquo;paws&rdquo; for a moment to debug and troubleshoot your code, while &ldquo;feline&rdquo; good about the elegant and readable code that you write. Whether you&rsquo;re a &ldquo;cat&rdquo;-egorical beginner or a seasoned functional programming &ldquo;purr&rdquo;-fessional, purrr has something to offer. So why not &ldquo;pounce&rdquo; on the opportunity to try it out and see how it can &ldquo;meow&rdquo;-velously improve your R coding experience?</p> </blockquote> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"purrr"</span><span class='o'>)</span></span></code></pre> </div> <p>purrr is 7 years old and it&rsquo;s finally made it to 1.0.0! This is a big release, adding some long-needed functionality (like progress bars!) as well as really refining the core purpose of purrr. In this post, we&rsquo;ll start with an overview of the breaking changes, then briefly review some documentation changes. Then we&rsquo;ll get to the good stuff: improvements to the <code>map</code> family, new <a href="https://purrr.tidyverse.org/reference/keep_at.html" target="_blank" rel="noopener"><code>keep_at()</code></a> and <a href="https://purrr.tidyverse.org/reference/keep_at.html" target="_blank" rel="noopener"><code>discard_at()</code></a> functions, and improvements to flattening and simplification. You can see a full list of changes in the <a href="https://github.com/tidyverse/purrr/releases/tag/v1.0.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://purrr.tidyverse.org/'>purrr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="breaking-changes">Breaking changes <a href="#breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve used the 1.0.0 release as an opportunity to really refine the core purpose of purrr: facilitating functional programming in R. We&rsquo;ve been more aggressive with deprecations and breaking changes than usual, because a 1.0.0 release signals that purrr is now <a href="https://lifecycle.r-lib.org/articles/stages.html#stable" target="_blank" rel="noopener">stable</a>, making it our last opportunity for major changes.</p> <p>These changes will break some existing code, but we&rsquo;ve done our best to make it affect as little code as possible. Out of the ~1400 CRAN packages that user purrr, only ~40 were negatively affected, and I <a href="https://github.com/tidyverse/purrr/issues/969" target="_blank" rel="noopener">made pull requests</a> to fix them all. Making these fixes helped give me confidence that, though we&rsquo;re deprecating quite a few functions and changing a few special cases, it shouldn&rsquo;t affect too much code in the wild.</p> <p>There are four important changes that you should be aware of:</p> <ul> <li> <a href="https://purrr.tidyverse.org/reference/pluck.html" target="_blank" rel="noopener"><code>pluck()</code></a> behaves differently when extracting 0-length vectors.</li> <li>The <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> family uses the tidyverse rules for coercion and recycling.</li> <li>All functions that modify lists handle <code>NULL</code> consistently.</li> <li>We&rsquo;ve deprecated functions that aren&rsquo;t related to the core purpose of purrr.</li> </ul> <h3 id="pluck-and-zero-length-vectors"><code>pluck()</code> and zero-length vectors <a href="#pluck-and-zero-length-vectors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Previously, <a href="https://purrr.tidyverse.org/reference/pluck.html" target="_blank" rel="noopener"><code>pluck()</code></a> replaced 0-length vectors with the value of <code>default</code>. Now <code>default</code> is only used for <code>NULL</code>s and absent elements:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span>, b <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/pluck.html'>pluck</a></span><span class='o'>(</span><span class='s'>"y"</span>, <span class='s'>"a"</span>, .default <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span> <span><span class='c'>#&gt; character(0)</span></span> <span></span><span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/pluck.html'>pluck</a></span><span class='o'>(</span><span class='s'>"y"</span>, <span class='s'>"b"</span>, .default <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] NA</span></span> <span></span><span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/pluck.html'>pluck</a></span><span class='o'>(</span><span class='s'>"y"</span>, <span class='s'>"c"</span>, .default <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] NA</span></span> <span></span></code></pre> </div> <p>This also influences the map family because using an integer vector, character vector, or list instead of a function automatically calls <a href="https://purrr.tidyverse.org/reference/pluck.html" target="_blank" rel="noopener"><code>pluck()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='kc'>NULL</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='m'>1</span>, .default <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 4</span></span> <span><span class='c'>#&gt; $ : num 1</span></span> <span><span class='c'>#&gt; $ : num 0</span></span> <span><span class='c'>#&gt; $ : num 0</span></span> <span><span class='c'>#&gt; $ : chr(0)</span></span> <span></span></code></pre> </div> <p>We made this change because it makes purrr more consistent with the rest of the tidyverse and it looks like it was a bug in the original implementation of the function.</p> <h3 id="tidyverse-consistency">Tidyverse consistency <a href="#tidyverse-consistency"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve tweaked the map family of functions to be more consistent with general tidyverse coercion and recycling rules, as implemented by the <a href="https://vctrs.r-lib.org" target="_blank" rel="noopener">vctrs</a> package. <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_lgl()</code></a>, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_int()</code></a>, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_int()</code></a>, and <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_dbl()</code></a> now follow the same <a href="https://vctrs.r-lib.org/articles/type-size.html#coercing-to-common-type" target="_blank" rel="noopener">coercion rules</a> as vctrs. In particular:</p> <ul> <li> <p><code>map_chr(TRUE, identity)</code>, <code>map_chr(0L, identity)</code>, and <code>map_chr(1.5, identity)</code> have been deprecated because we believe that converting a logical/integer/double to a character vector is potentially dangerous and should require an explicit coercion.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># previously you could write</span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_chr</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Automatic coercion from double to character was deprecated in purrr 1.0.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use an explicit call to `as.character()` within `map_chr()` instead.</span></span> <span></span><span><span class='c'>#&gt; [1] "2.000000" "3.000000" "4.000000" "5.000000"</span></span> <span></span><span></span> <span><span class='c'># now you need something like this:</span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_chr</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>as.character</a></span><span class='o'>(</span><span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "2" "3" "4" "5"</span></span> <span></span></code></pre> </div> </li> <li> <p> <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_int()</code></a> requires that the numeric results be close to integers, rather than silently truncating to integers. Compare these two examples:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_int</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>/</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map_int()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In index: 1.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't coerce from a double vector to an integer vector.</span></span> <span></span><span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_int</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>*</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 2 4 6</span></span> <span></span></code></pre> </div> </li> </ul> <p> <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2()</code></a>, <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify2()</code></a>, and <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap()</code></a> use tidyverse recycling rules, which mean that vectors of length 1 are recycled to any size but all other vectors must have the same length. This has two major changes:</p> <ul> <li> <p>Previously, the presence of a zero-length input generated a zero-length output. Now it&rsquo;s recycled using the same rules:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>map2</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>2</span>, <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>paste</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map2()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't recycle `.x` (size 2) to match `.y` (size 0).</span></span> <span></span><span></span> <span><span class='c'># Works because length-1 vector gets recycled to length-0</span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>map2</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>paste</span><span class='o'>)</span></span> <span><span class='c'>#&gt; list()</span></span> <span></span></code></pre> </div> </li> <li> <p>And now must explicitly recycle vectors that aren&rsquo;t length 1:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>map2_int</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='m'>20</span><span class='o'>)</span>, <span class='nv'>`+`</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map2_int()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't recycle `.x` (size 4) to match `.y` (size 2).</span></span> <span></span><span></span> <span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>map2_int</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>, <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='m'>20</span><span class='o'>)</span>, <span class='m'>2</span><span class='o'>)</span>, <span class='nv'>`+`</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 11 22 13 24</span></span> <span></span></code></pre> </div> </li> </ul> <h3 id="assigning-null">Assigning <code>NULL</code> <a href="#assigning-null"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>purrr has a number of functions that modify a list: <code>pluck&lt;-()</code>, <a href="https://purrr.tidyverse.org/reference/modify_in.html" target="_blank" rel="noopener"><code>assign_in()</code></a>, <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify()</code></a>, <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify2()</code></a>, <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify_if()</code></a>, <a href="https://purrr.tidyverse.org/reference/modify.html" target="_blank" rel="noopener"><code>modify_at()</code></a>, and <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_modify()</code></a>. Previously, these functions had inconsistent behaviour when you attempted to modify an element with <code>NULL</code>: some functions would delete that element, and some would set it to <code>NULL</code>. That inconsistency arose because base R handles <code>NULL</code> in different ways depending on whether or not use you <code>$</code>/<code>[[</code> or <code>[</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x1</span> <span class='o'>&lt;-</span> <span class='nv'>x2</span> <span class='o'>&lt;-</span> <span class='nv'>x3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span>, b <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>x1</span><span class='o'>$</span><span class='nv'>a</span> <span class='o'>&lt;-</span> <span class='kc'>NULL</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nv'>x1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 1</span></span> <span><span class='c'>#&gt; $ b: num 2</span></span> <span></span><span></span> <span><span class='nv'>x2</span><span class='o'>[</span><span class='s'>"a"</span><span class='o'>]</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='kc'>NULL</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nv'>x2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ a: NULL</span></span> <span><span class='c'>#&gt; $ b: num 2</span></span> <span></span></code></pre> </div> <p>Now functions that edit a list will create an element containing <code>NULL</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_assign.html'>list_modify</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ a: NULL</span></span> <span><span class='c'>#&gt; $ b: num 2</span></span> <span></span><span></span> <span><span class='nv'>x3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/modify.html'>modify_at</a></span><span class='o'>(</span><span class='s'>"b"</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ a: num 1</span></span> <span><span class='c'>#&gt; $ b: NULL</span></span> <span></span></code></pre> </div> <p>If you want to delete the element, you can use the special <a href="https://rlang.r-lib.org/reference/zap.html" target="_blank" rel="noopener"><code>zap()</code></a> sentinel:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x3</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_assign.html'>list_modify</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='nf'><a href='https://rlang.r-lib.org/reference/zap.html'>zap</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 1</span></span> <span><span class='c'>#&gt; $ b: num 2</span></span> <span></span></code></pre> </div> <p> <a href="https://rlang.r-lib.org/reference/zap.html" target="_blank" rel="noopener"><code>zap()</code></a> does not work in <code>modify*()</code> because those functions are designed to always return the same top-level structure as the input.</p> <h3 id="core-purpose-refinements">Core purpose refinements <a href="#core-purpose-refinements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We have <strong>deprecated</strong> a number of functions to keep purrr focused on its core purpose: facilitating functional programming in R. Deprecation means that the functions will continue to work, but you&rsquo;ll be warned once every 8 hours if you use them. In several years time, we&rsquo;ll release an update which causes the warnings to occur on every time you use them, and a few years after that they&rsquo;ll be transformed to throwing errors.</p> <ul> <li> <p> <a href="https://purrr.tidyverse.org/reference/cross.html" target="_blank" rel="noopener"><code>cross()</code></a> and all its variants have been deprecated because they&rsquo;re slow and buggy, and a better approach already exists in <a href="https://tidyr.tidyverse.org/reference/expand_grid.html" target="_blank" rel="noopener"><code>tidyr::expand_grid()</code></a>.</p> </li> <li> <p> <a href="https://purrr.tidyverse.org/reference/update_list.html" target="_blank" rel="noopener"><code>update_list()</code></a>, <a href="https://purrr.tidyverse.org/reference/rerun.html" target="_blank" rel="noopener"><code>rerun()</code></a>, and the use of tidyselect with <a href="https://purrr.tidyverse.org/reference/map_if.html" target="_blank" rel="noopener"><code>map_at()</code></a> and friends have been deprecated because we no longer believe that non-standard evaluation is a good fit for purrr.</p> </li> <li> <p>The <code>lift_*</code> family of functions has been superseded because they promote a style of function manipulation that is not commonly used in R.</p> </li> <li> <p> <a href="https://purrr.tidyverse.org/reference/prepend.html" target="_blank" rel="noopener"><code>prepend()</code></a>, <a href="https://purrr.tidyverse.org/reference/rdunif.html" target="_blank" rel="noopener"><code>rdunif()</code></a>, <a href="https://purrr.tidyverse.org/reference/rbernoulli.html" target="_blank" rel="noopener"><code>rbernoulli()</code></a>, <a href="https://purrr.tidyverse.org/reference/when.html" target="_blank" rel="noopener"><code>when()</code></a>, and <a href="https://purrr.tidyverse.org/reference/along.html" target="_blank" rel="noopener"><code>list_along()</code></a> have been deprecated because they&rsquo;re not directly related to functional programming.</p> </li> <li> <p><code>splice()</code> has been deprecated because we no longer believe that automatic splicing makes for good UI and there are other ways to achieve the same result.</p> </li> </ul> <p>Consult the documentation for the alternatives that we now recommend.</p> <p>Deprecating these functions makes purrr easier to maintain because it reduces the surface area for bugs and issues, and it makes purrr easier to learn because there&rsquo;s a clearer common thread that ties together all functions.</p> <h2 id="documentation">Documentation <a href="#documentation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As you&rsquo;ve seen in the code above, we are moving from magrittr&rsquo;s pipe (<code>%&gt;%</code>) to the base pipe (<code>|&gt;</code>) and from formula syntax (<code>~ .x + 1</code>) to R&rsquo;s new anonymous function short hand (<code>\(x) x + 1</code>). We believe that it&rsquo;s better to use these new base tools because they work everywhere: the base pipe doesn&rsquo;t require that you load magrittr and the new function shorthand works everywhere, not just in purrr functions. Additionally, being able to specify the argument name for the anonymous function can often lead to clearer code.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Previously we wrote</span></span> <span><span class='m'>1</span><span class='o'>:</span><span class='m'>10</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='nv'>.x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_dbl</a></span><span class='o'>(</span><span class='nv'>mean</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.5586355 1.8213041 2.8764412 4.1521664 5.1160393 6.1271905</span></span> <span><span class='c'>#&gt; [7] 6.9109806 8.2808301 9.2373940 10.6269104</span></span> <span></span><span></span> <span><span class='c'># Now we recommend</span></span> <span><span class='m'>1</span><span class='o'>:</span><span class='m'>10</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>mu</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='nv'>mu</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_dbl</a></span><span class='o'>(</span><span class='nv'>mean</span><span class='o'>)</span> </span> <span><span class='c'>#&gt; [1] 0.4638639 2.0966712 3.4441928 3.7806185 5.3373228 6.1854820</span></span> <span><span class='c'>#&gt; [7] 6.5873300 8.3116138 9.4824697 10.4590034</span></span> <span></span></code></pre> </div> <p>We also recommend using an anonymous function instead of passing additional arguments to map. This avoids a certain class of moderately esoteric argument matching woes and, we believe, is generally easier to read.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mu</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>10</span>, <span class='m'>100</span><span class='o'>)</span></span> <span></span> <span><span class='c'># Previously we wrote</span></span> <span><span class='nv'>mu</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_dbl</a></span><span class='o'>(</span><span class='nv'>rnorm</span>, n <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.5706199 11.3604613 99.9291426</span></span> <span></span><span></span> <span><span class='c'># Now we recommend</span></span> <span><span class='nv'>mu</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_dbl</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>mu</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/stats/Normal.html'>rnorm</a></span><span class='o'>(</span><span class='m'>1</span>, mean <span class='o'>=</span> <span class='nv'>mu</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 0.7278463 7.5533200 100.0654866</span></span> <span></span></code></pre> </div> <p>Due to the <a href="https://www.tidyverse.org/blog/2019/04/r-version-support/" target="_blank" rel="noopener">tidyverse R dependency policy</a>, purrr works in R 3.5, 3.6, 4.0, 4.1, and 4.2, but the base pipe and anonymous function syntax are only available in R 4.0 and later. So the examples are automatically disabled on R 3.5 and 3.6 to allow purrr to continue to pass <code>R CMD check</code>.</p> <h2 id="mapping">Mapping <a href="#mapping"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>With that out of the way, we can now talk about the exciting new features in purrr 1.0.0. We&rsquo;ll start with the map family of functions which have three big new features:</p> <ul> <li>Progress bars.</li> <li>Better errors.</li> <li>A new family member: <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a>.</li> </ul> <p>These are described in the following sections.</p> <h3 id="progress-bars">Progress bars <a href="#progress-bars"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The map family can now produce a progress bar. This is very useful for long running jobs:</p> <div class="highlight"> <p><img src="figs//progress.svg" width="700px" style="display: block; margin: auto;" /></p> </div> <p>(For interactive use, the progress bar uses some simple heuristics so that it doesn&rsquo;t show up for very simple jobs.)</p> <p>In most cases, we expect that <code>.progress = TRUE</code> is enough, but if you&rsquo;re wrapping <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> in another function, you might want to set <code>.progress</code> to a string that identifies the progress bar:</p> <div class="highlight"> <p><img src="figs//named-progress.svg" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="better-errors">Better errors <a href="#better-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If there&rsquo;s an error in the function you&rsquo;re mapping, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map()</code></a> and friends now tell you which element caused the problem:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>500</span><span class='o'>)</span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='kr'>if</span> <span class='o'>(</span><span class='nv'>x</span> <span class='o'>==</span> <span class='m'>1</span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"Error!"</span><span class='o'>)</span> <span class='kr'>else</span> <span class='m'>10</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In index: 51.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `.f()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Error!</span></span> <span></span></code></pre> </div> <p>We hope that this makes your debugging life just a little bit easier! (Don&rsquo;t forget about <a href="https://purrr.tidyverse.org/reference/safely.html" target="_blank" rel="noopener"><code>safely()</code></a> and <a href="https://purrr.tidyverse.org/reference/possibly.html" target="_blank" rel="noopener"><code>possibly()</code></a> if you expect failures and want to either ignore or capture them.)</p> <p>We have also generally reviewed the error messages throughout purrr in order to make them more actionable. If you hit a confusing error message, please let us know!</p> <h3 id="new-map_vec">New <code>map_vec()</code> <a href="#new-map_vec"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve added <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a> (along with <a href="https://purrr.tidyverse.org/reference/map2.html" target="_blank" rel="noopener"><code>map2_vec()</code></a>, and <a href="https://purrr.tidyverse.org/reference/pmap.html" target="_blank" rel="noopener"><code>pmap_vec()</code></a>) to handle more types of vectors. <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a> extends <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_lgl()</code></a>, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_int()</code></a>, <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_dbl()</code></a>, and <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_chr()</code></a> to arbitrary types of vectors, like dates, factors, and date-times:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>i</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>letters</span><span class='o'>[</span><span class='nv'>i</span><span class='o'>]</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] a b c</span></span> <span><span class='c'>#&gt; Levels: a b c</span></span> <span></span><span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>i</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>letters</span><span class='o'>[</span><span class='nv'>i</span><span class='o'>]</span>, levels <span class='o'>=</span> <span class='nv'>letters</span><span class='o'>[</span><span class='m'>4</span><span class='o'>:</span><span class='m'>1</span><span class='o'>]</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] a b c</span></span> <span><span class='c'>#&gt; Levels: d c b a</span></span> <span></span><span></span> <span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>i</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/ISOdatetime.html'>ISOdate</a></span><span class='o'>(</span><span class='nv'>i</span> <span class='o'>+</span> <span class='m'>2022</span>, <span class='m'>10</span>, <span class='m'>5</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "2023-10-05" "2024-10-05" "2025-10-05"</span></span> <span></span><span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span>\<span class='o'>(</span><span class='nv'>i</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/ISOdatetime.html'>ISOdate</a></span><span class='o'>(</span><span class='nv'>i</span> <span class='o'>+</span> <span class='m'>2022</span>, <span class='m'>10</span>, <span class='m'>5</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "2023-10-05 12:00:00 GMT" "2024-10-05 12:00:00 GMT"</span></span> <span><span class='c'>#&gt; [3] "2025-10-05 12:00:00 GMT"</span></span> <span></span></code></pre> </div> <p> <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a> exists somewhat in the middle of base R&rsquo;s <a href="https://rdrr.io/r/base/lapply.html" target="_blank" rel="noopener"><code>sapply()</code></a> and <a href="https://rdrr.io/r/base/lapply.html" target="_blank" rel="noopener"><code>vapply()</code></a>. Unlike <a href="https://rdrr.io/r/base/lapply.html" target="_blank" rel="noopener"><code>sapply()</code></a> it will always return a simpler vector, erroring if there&rsquo;s no common type:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='m'>1</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span><span class='nv'>identity</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map_vec()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't combine `&lt;list&gt;[[1]]` &lt;character&gt; and `&lt;list&gt;[[2]]` &lt;double&gt;.</span></span> <span></span></code></pre> </div> <p>If you want to require a certain type of output, supply <code>.ptype</code>, making <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a> behave more like <a href="https://rdrr.io/r/base/lapply.html" target="_blank" rel="noopener"><code>vapply()</code></a>. <code>ptype</code> is short for prototype, and should be a vector that exemplifies the type of output you expect.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span><span class='o'>)</span> </span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span><span class='nv'>identity</span>, .ptype <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "a" "b"</span></span> <span></span><span></span> <span><span class='c'># will error if the result can't be automatically coerced</span></span> <span><span class='c'># to the specified ptype</span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_vec</a></span><span class='o'>(</span><span class='nv'>identity</span>, .ptype <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/integer.html'>integer</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `map_vec()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't convert `&lt;list&gt;[[1]]` &lt;character&gt; to &lt;integer&gt;.</span></span> <span></span></code></pre> </div> <p>We don&rsquo;t expect you to know or memorise the <a href="https://vctrs.r-lib.org/reference/faq-compatibility-types.html" target="_blank" rel="noopener">rules that vctrs uses for coercion</a>; our hope is that they&rsquo;ll become second nature as we steadily ensure that every tidyverse function follows the same rules.</p> <h2 id="keep_at-and-discard_at"><code>keep_at()</code> and <code>discard_at()</code> <a href="#keep_at-and-discard_at"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>purrr has gained a new pair of functions, <a href="https://purrr.tidyverse.org/reference/keep_at.html" target="_blank" rel="noopener"><code>keep_at()</code></a> and <a href="https://purrr.tidyverse.org/reference/keep_at.html" target="_blank" rel="noopener"><code>discard_at()</code></a>, that work like <a href="https://purrr.tidyverse.org/reference/keep.html" target="_blank" rel="noopener"><code>keep()</code></a> and <a href="https://purrr.tidyverse.org/reference/keep.html" target="_blank" rel="noopener"><code>discard()</code></a> but operate on names rather than values:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span>, b <span class='o'>=</span> <span class='m'>2</span>, c <span class='o'>=</span> <span class='m'>3</span>, D <span class='o'>=</span> <span class='m'>4</span>, E <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/keep_at.html'>keep_at</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span>, <span class='s'>"c"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 3</span></span> <span><span class='c'>#&gt; $ a: num 1</span></span> <span><span class='c'>#&gt; $ b: num 2</span></span> <span><span class='c'>#&gt; $ c: num 3</span></span> <span></span><span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/keep_at.html'>discard_at</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span>, <span class='s'>"c"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ D: num 4</span></span> <span><span class='c'>#&gt; $ E: num 5</span></span> <span></span></code></pre> </div> <p>Alternatively, you can supply a function that is called with the names of the elements and should return a logical vector describing which elements to keep/discard:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>is_lower_case</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>==</span> <span class='nf'><a href='https://rdrr.io/r/base/chartr.html'>tolower</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/keep_at.html'>keep_at</a></span><span class='o'>(</span><span class='nv'>is_lower_case</span><span class='o'>)</span></span> <span><span class='c'>#&gt; $a</span></span> <span><span class='c'>#&gt; [1] 1</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; $b</span></span> <span><span class='c'>#&gt; [1] 2</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; $c</span></span> <span><span class='c'>#&gt; [1] 3</span></span> <span></span></code></pre> </div> <p>You can now also pass such a function to all other <code>_at()</code> functions:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/modify.html'>modify_at</a></span><span class='o'>(</span><span class='nv'>is_lower_case</span>, \<span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>*</span> <span class='m'>100</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 5</span></span> <span><span class='c'>#&gt; $ a: num 100</span></span> <span><span class='c'>#&gt; $ b: num 200</span></span> <span><span class='c'>#&gt; $ c: num 300</span></span> <span><span class='c'>#&gt; $ D: num 4</span></span> <span><span class='c'>#&gt; $ E: num 5</span></span> <span></span></code></pre> </div> <h2 id="flattening-and-simplification">Flattening and simplification <a href="#flattening-and-simplification"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Last, but not least, we&rsquo;ve reworked the family of functions that flatten and simplify lists. These caused us a lot of confusion internally because folks (and different packages) used the same words to mean different things. Now there are three main functions that share a common prefix that makes it clear that they all operate on lists:</p> <ul> <li> <a href="https://purrr.tidyverse.org/reference/list_flatten.html" target="_blank" rel="noopener"><code>list_flatten()</code></a> removes a single level of hierarchy from a list; the output is always a list.</li> <li> <a href="https://purrr.tidyverse.org/reference/list_simplify.html" target="_blank" rel="noopener"><code>list_simplify()</code></a> reduces a list to a homogeneous vector; the output is always the same length as the input.</li> <li> <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_c()</code></a>, <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_cbind()</code></a>, and <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> concatenate the elements of a list to produce a vector or data frame. There are no constraints on the output.</li> </ul> <p>These functions have lead us to <strong>supersede</strong> a number of functions. This means that they are not going away but we no longer recommend them, and they will receive only critical bug fixes.</p> <ul> <li><code>flatten()</code> has been superseded by <a href="https://purrr.tidyverse.org/reference/list_flatten.html" target="_blank" rel="noopener"><code>list_flatten()</code></a>.</li> <li><code>flatten_lgl()</code>, <code>flatten_int()</code>, <code>flatten_dbl()</code>, and <code>flatten_chr()</code> have been superseded by <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_c()</code></a>.</li> <li> <a href="https://purrr.tidyverse.org/reference/flatten.html" target="_blank" rel="noopener"><code>flatten_dfr()</code></a> and <a href="https://purrr.tidyverse.org/reference/flatten.html" target="_blank" rel="noopener"><code>flatten_dfc()</code></a> have been superseded by <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> and <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_cbind()</code></a> respectively. <a href="https://purrr.tidyverse.org/reference/flatten.html" target="_blank" rel="noopener"><code>flatten_dfr()</code></a> had some particularly puzzling edge cases when the inputs would be flattened into columns.</li> <li> <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfc()</code></a> and <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfr()</code></a> (and their <code>map2</code> and <code>pmap</code> variants) have been superseded in favour of using the appropriate map function along with <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> or <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_cbind()</code></a>.</li> <li> <a href="https://purrr.tidyverse.org/reference/as_vector.html" target="_blank" rel="noopener"><code>simplify()</code></a>, <a href="https://purrr.tidyverse.org/reference/as_vector.html" target="_blank" rel="noopener"><code>simplify_all()</code></a>, and <a href="https://purrr.tidyverse.org/reference/as_vector.html" target="_blank" rel="noopener"><code>as_vector()</code></a> have been superseded in favour of <a href="https://purrr.tidyverse.org/reference/list_simplify.html" target="_blank" rel="noopener"><code>list_simplify()</code></a>.</li> </ul> <h3 id="flattening">Flattening <a href="#flattening"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://purrr.tidyverse.org/reference/list_flatten.html" target="_blank" rel="noopener"><code>list_flatten()</code></a> removes one layer of hierarchy from a list. In other words, if any of the children of the list are themselves lists, the contents of those lists are inlined into the parent:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>3</span>, <span class='m'>4</span><span class='o'>)</span>, <span class='m'>5</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ : num 1</span></span> <span><span class='c'>#&gt; $ :List of 3</span></span> <span><span class='c'>#&gt; ..$ : num 2</span></span> <span><span class='c'>#&gt; ..$ :List of 2</span></span> <span><span class='c'>#&gt; .. ..$ : num 3</span></span> <span><span class='c'>#&gt; .. ..$ : num 4</span></span> <span><span class='c'>#&gt; ..$ : num 5</span></span> <span></span><span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 4</span></span> <span><span class='c'>#&gt; $ : num 1</span></span> <span><span class='c'>#&gt; $ : num 2</span></span> <span><span class='c'>#&gt; $ :List of 2</span></span> <span><span class='c'>#&gt; ..$ : num 3</span></span> <span><span class='c'>#&gt; ..$ : num 4</span></span> <span><span class='c'>#&gt; $ : num 5</span></span> <span></span><span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 5</span></span> <span><span class='c'>#&gt; $ : num 1</span></span> <span><span class='c'>#&gt; $ : num 2</span></span> <span><span class='c'>#&gt; $ : num 3</span></span> <span><span class='c'>#&gt; $ : num 4</span></span> <span><span class='c'>#&gt; $ : num 5</span></span> <span></span></code></pre> </div> <p> <a href="https://purrr.tidyverse.org/reference/list_flatten.html" target="_blank" rel="noopener"><code>list_flatten()</code></a> always returns a list; once a list is as flat as it can get (i.e. none of its children contain lists), it leaves the input unchanged.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_flatten.html'>list_flatten</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 5</span></span> <span><span class='c'>#&gt; $ : num 1</span></span> <span><span class='c'>#&gt; $ : num 2</span></span> <span><span class='c'>#&gt; $ : num 3</span></span> <span><span class='c'>#&gt; $ : num 4</span></span> <span><span class='c'>#&gt; $ : num 5</span></span> <span></span></code></pre> </div> <h3 id="simplification">Simplification <a href="#simplification"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://purrr.tidyverse.org/reference/list_simplify.html" target="_blank" rel="noopener"><code>list_simplify()</code></a> maintains the length of the input, but produces a simpler type:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 3</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span>, <span class='s'>"c"</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "a" "b" "c"</span></span> <span></span></code></pre> </div> <p>Because the length must stay the same, it will only succeed if every element has length 1:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `list_simplify()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `x[[3]]` must have size 1, not size 2.</span></span> <span></span><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='nf'><a href='https://rdrr.io/r/base/integer.html'>integer</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `list_simplify()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `x[[3]]` must have size 1, not size 0.</span></span> <span></span></code></pre> </div> <p>Because the result must be a simpler vector, all the components must be compatible:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='s'>"a"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `list_simplify()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't combine `&lt;list&gt;[[1]]` &lt;double&gt; and `&lt;list&gt;[[3]]` &lt;character&gt;.</span></span> <span></span></code></pre> </div> <p>If you need to simplify if it&rsquo;s possible, but otherwise leave the input unchanged, use <code>strict = FALSE</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='s'>"a"</span><span class='o'>)</span>, strict <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [[1]]</span></span> <span><span class='c'>#&gt; [1] 1</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[2]]</span></span> <span><span class='c'>#&gt; [1] 2</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; [[3]]</span></span> <span><span class='c'>#&gt; [1] "a"</span></span> <span></span></code></pre> </div> <p>If you want to be specific about the type you want, <a href="https://purrr.tidyverse.org/reference/list_simplify.html" target="_blank" rel="noopener"><code>list_simplify()</code></a> can take the same prototype argument as <a href="https://purrr.tidyverse.org/reference/map.html" target="_blank" rel="noopener"><code>map_vec()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span>ptype <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/integer.html'>integer</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 3</span></span> <span></span><span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_simplify.html'>list_simplify</a></span><span class='o'>(</span>ptype <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `list_simplify()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't convert `&lt;list&gt;[[1]]` &lt;double&gt; to &lt;factor&lt;&gt;&gt;.</span></span> <span></span></code></pre> </div> <h3 id="concatenation">Concatenation <a href="#concatenation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_c()</code></a>, <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_cbind()</code></a>, and <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> concatenate all elements together in a similar way to using <code>do.call(c)</code> or <code>do.call(rbind)</code><sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> . Unlike <a href="https://purrr.tidyverse.org/reference/list_simplify.html" target="_blank" rel="noopener"><code>list_simplify()</code></a>, this allows the elements to be different lengths:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_c.html'>list_c</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 3</span></span> <span></span><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>:</span><span class='m'>4</span>, <span class='nf'><a href='https://rdrr.io/r/base/integer.html'>integer</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_c.html'>list_c</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 2 3 4</span></span> <span></span></code></pre> </div> <p>The downside of this flexibility is that these functions break the connection between the input and the output. This reveals that <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfr()</code></a> and <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfc()</code></a> don&rsquo;t really belong to the map family because they don&rsquo;t maintain a 1-to-1 mapping between input and output: there&rsquo;s reliable no way to associate a row in the output with an element in an input.</p> <p>For this reason, <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfr()</code></a> and <a href="https://purrr.tidyverse.org/reference/map_dfr.html" target="_blank" rel="noopener"><code>map_dfc()</code></a> (and the <code>map2</code> and <code>pmap</code>) variants are superseded and we recommend switching to an explicit call to <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_rbind()</code></a> or <a href="https://purrr.tidyverse.org/reference/list_c.html" target="_blank" rel="noopener"><code>list_cbind()</code></a> instead:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>paths</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map_dfr.html'>map_dfr</a></span><span class='o'>(</span><span class='nv'>read_csv</span>, .id <span class='o'>=</span> <span class='s'>"path"</span><span class='o'>)</span></span> <span><span class='c'># now</span></span> <span><span class='nv'>paths</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map</a></span><span class='o'>(</span><span class='nv'>read_csv</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_c.html'>list_rbind</a></span><span class='o'>(</span>names_to <span class='o'>=</span> <span class='s'>"path"</span><span class='o'>)</span></span></code></pre> </div> <p>This new behaviour also affects to <a href="https://purrr.tidyverse.org/reference/accumulate.html" target="_blank" rel="noopener"><code>accumulate()</code></a> and <a href="https://purrr.tidyverse.org/reference/accumulate.html" target="_blank" rel="noopener"><code>accumulate2()</code></a>, which previously had an idiosyncratic approach to simplification.</p> <h3 id="list_assign"><code>list_assign()</code> <a href="#list_assign"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>There&rsquo;s one other new function that isn&rsquo;t directly related to flattening and friends, but shares the <code>list_</code> prefix: <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_assign()</code></a>. <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_assign()</code></a> is similar to <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_modify()</code></a> but it doesn&rsquo;t work recursively. This is a mildly confusing feature of <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_modify()</code></a> that it&rsquo;s easy to miss in the documentation.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_assign.html'>list_modify</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>b <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ x: num 1</span></span> <span><span class='c'>#&gt; $ y:List of 2</span></span> <span><span class='c'>#&gt; ..$ a: num 1</span></span> <span><span class='c'>#&gt; ..$ b: num 1</span></span> <span></span></code></pre> </div> <p> <a href="https://purrr.tidyverse.org/reference/list_assign.html" target="_blank" rel="noopener"><code>list_assign()</code></a> doesn&rsquo;t recurse into sublists making it a bit easier to reason about:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>a <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/list_assign.html'>list_assign</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>b <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> </span> <span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; List of 2</span></span> <span><span class='c'>#&gt; $ x: num 1</span></span> <span><span class='c'>#&gt; $ y:List of 1</span></span> <span><span class='c'>#&gt; ..$ b: num 2</span></span> <span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A massive thanks to all 162 contributors who have helped make purrr 1.0.0 happen! <a href="https://github.com/adamroyjones" target="_blank" rel="noopener">@adamroyjones</a>, <a href="https://github.com/afoltzm" target="_blank" rel="noopener">@afoltzm</a>, <a href="https://github.com/agilebean" target="_blank" rel="noopener">@agilebean</a>, <a href="https://github.com/ahjames11" target="_blank" rel="noopener">@ahjames11</a>, <a href="https://github.com/AHoerner" target="_blank" rel="noopener">@AHoerner</a>, <a href="https://github.com/alberto-dellera" target="_blank" rel="noopener">@alberto-dellera</a>, <a href="https://github.com/alex-gable" target="_blank" rel="noopener">@alex-gable</a>, <a href="https://github.com/AliciaSchep" target="_blank" rel="noopener">@AliciaSchep</a>, <a href="https://github.com/ArtemSokolov" target="_blank" rel="noopener">@ArtemSokolov</a>, <a href="https://github.com/AshesITR" target="_blank" rel="noopener">@AshesITR</a>, <a href="https://github.com/asmlgkj" target="_blank" rel="noopener">@asmlgkj</a>, <a href="https://github.com/aubryvetepi" target="_blank" rel="noopener">@aubryvetepi</a>, <a href="https://github.com/balwierz" target="_blank" rel="noopener">@balwierz</a>, <a href="https://github.com/bastianilso" target="_blank" rel="noopener">@bastianilso</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/bebersb" target="_blank" rel="noopener">@bebersb</a>, <a href="https://github.com/behrman" target="_blank" rel="noopener">@behrman</a>, <a href="https://github.com/benjaminschwetz" target="_blank" rel="noopener">@benjaminschwetz</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/Breza" target="_blank" rel="noopener">@Breza</a>, <a href="https://github.com/brunj7" target="_blank" rel="noopener">@brunj7</a>, <a href="https://github.com/BrunoGrandePhD" target="_blank" rel="noopener">@BrunoGrandePhD</a>, <a href="https://github.com/CGMossa" target="_blank" rel="noopener">@CGMossa</a>, <a href="https://github.com/cgoo4" target="_blank" rel="noopener">@cgoo4</a>, <a href="https://github.com/chsafouane" target="_blank" rel="noopener">@chsafouane</a>, <a href="https://github.com/chumbleycode" target="_blank" rel="noopener">@chumbleycode</a>, <a href="https://github.com/ColinFay" target="_blank" rel="noopener">@ColinFay</a>, <a href="https://github.com/CorradoLanera" target="_blank" rel="noopener">@CorradoLanera</a>, <a href="https://github.com/CPRyan" target="_blank" rel="noopener">@CPRyan</a>, <a href="https://github.com/czeildi" target="_blank" rel="noopener">@czeildi</a>, <a href="https://github.com/dan-reznik" target="_blank" rel="noopener">@dan-reznik</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/datawookie" target="_blank" rel="noopener">@datawookie</a>, <a href="https://github.com/dave-lovell" target="_blank" rel="noopener">@dave-lovell</a>, <a href="https://github.com/davidsjoberg" target="_blank" rel="noopener">@davidsjoberg</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/deann88" target="_blank" rel="noopener">@deann88</a>, <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a>, <a href="https://github.com/dhslone" target="_blank" rel="noopener">@dhslone</a>, <a href="https://github.com/dlependorf" target="_blank" rel="noopener">@dlependorf</a>, <a href="https://github.com/dllazarov" target="_blank" rel="noopener">@dllazarov</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/dracodoc" target="_blank" rel="noopener">@dracodoc</a>, <a href="https://github.com/echasnovski" target="_blank" rel="noopener">@echasnovski</a>, <a href="https://github.com/edo91" target="_blank" rel="noopener">@edo91</a>, <a href="https://github.com/edoardo-oliveri-sdg" target="_blank" rel="noopener">@edoardo-oliveri-sdg</a>, <a href="https://github.com/erictleung" target="_blank" rel="noopener">@erictleung</a>, <a href="https://github.com/eyayaw" target="_blank" rel="noopener">@eyayaw</a>, <a href="https://github.com/felixhell2004" target="_blank" rel="noopener">@felixhell2004</a>, <a href="https://github.com/florianm" target="_blank" rel="noopener">@florianm</a>, <a href="https://github.com/florisvdh" target="_blank" rel="noopener">@florisvdh</a>, <a href="https://github.com/flying-sheep" target="_blank" rel="noopener">@flying-sheep</a>, <a href="https://github.com/fpinter" target="_blank" rel="noopener">@fpinter</a>, <a href="https://github.com/frankzhang21" target="_blank" rel="noopener">@frankzhang21</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/GarrettMooney" target="_blank" rel="noopener">@GarrettMooney</a>, <a href="https://github.com/gdurif" target="_blank" rel="noopener">@gdurif</a>, <a href="https://github.com/ge-li" target="_blank" rel="noopener">@ge-li</a>, <a href="https://github.com/ggrothendieck" target="_blank" rel="noopener">@ggrothendieck</a>, <a href="https://github.com/grayskripko" target="_blank" rel="noopener">@grayskripko</a>, <a href="https://github.com/gregleleu" target="_blank" rel="noopener">@gregleleu</a>, <a href="https://github.com/gregorp" target="_blank" rel="noopener">@gregorp</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hendrikvanb" target="_blank" rel="noopener">@hendrikvanb</a>, <a href="https://github.com/holgerbrandl" target="_blank" rel="noopener">@holgerbrandl</a>, <a href="https://github.com/hriebl" target="_blank" rel="noopener">@hriebl</a>, <a href="https://github.com/hsloot" target="_blank" rel="noopener">@hsloot</a>, <a href="https://github.com/huftis" target="_blank" rel="noopener">@huftis</a>, <a href="https://github.com/iago-pssjd" target="_blank" rel="noopener">@iago-pssjd</a>, <a href="https://github.com/iamnicogomez" target="_blank" rel="noopener">@iamnicogomez</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/irudnyts" target="_blank" rel="noopener">@irudnyts</a>, <a href="https://github.com/izahn" target="_blank" rel="noopener">@izahn</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/jedwards24" target="_blank" rel="noopener">@jedwards24</a>, <a href="https://github.com/jemus42" target="_blank" rel="noopener">@jemus42</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jhrcook" target="_blank" rel="noopener">@jhrcook</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/jimjam-slam" target="_blank" rel="noopener">@jimjam-slam</a>, <a href="https://github.com/jnolis" target="_blank" rel="noopener">@jnolis</a>, <a href="https://github.com/joelgombin" target="_blank" rel="noopener">@joelgombin</a>, <a href="https://github.com/jonathan-g" target="_blank" rel="noopener">@jonathan-g</a>, <a href="https://github.com/jpmarindiaz" target="_blank" rel="noopener">@jpmarindiaz</a>, <a href="https://github.com/jxu" target="_blank" rel="noopener">@jxu</a>, <a href="https://github.com/jzadra" target="_blank" rel="noopener">@jzadra</a>, <a href="https://github.com/karchjd" target="_blank" rel="noopener">@karchjd</a>, <a href="https://github.com/karjamatti" target="_blank" rel="noopener">@karjamatti</a>, <a href="https://github.com/kbzsl" target="_blank" rel="noopener">@kbzsl</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lahvak" target="_blank" rel="noopener">@lahvak</a>, <a href="https://github.com/lambdamoses" target="_blank" rel="noopener">@lambdamoses</a>, <a href="https://github.com/lasuk" target="_blank" rel="noopener">@lasuk</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/LukasWallrich" target="_blank" rel="noopener">@LukasWallrich</a>, <a href="https://github.com/LukaszDerylo" target="_blank" rel="noopener">@LukaszDerylo</a>, <a href="https://github.com/malcolmbarrett" target="_blank" rel="noopener">@malcolmbarrett</a>, <a href="https://github.com/MarceloRTonon" target="_blank" rel="noopener">@MarceloRTonon</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/Maximilian-Stefan-Ernst" target="_blank" rel="noopener">@Maximilian-Stefan-Ernst</a>, <a href="https://github.com/mccroweyclinton-EPA" target="_blank" rel="noopener">@mccroweyclinton-EPA</a>, <a href="https://github.com/medewitt" target="_blank" rel="noopener">@medewitt</a>, <a href="https://github.com/meowcat" target="_blank" rel="noopener">@meowcat</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/mitchelloharawild" target="_blank" rel="noopener">@mitchelloharawild</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/mlane3" target="_blank" rel="noopener">@mlane3</a>, <a href="https://github.com/mmuurr" target="_blank" rel="noopener">@mmuurr</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/mpettis" target="_blank" rel="noopener">@mpettis</a>, <a href="https://github.com/nealrichardson" target="_blank" rel="noopener">@nealrichardson</a>, <a href="https://github.com/Nelson-Gon" target="_blank" rel="noopener">@Nelson-Gon</a>, <a href="https://github.com/neuwirthe" target="_blank" rel="noopener">@neuwirthe</a>, <a href="https://github.com/njtierney" target="_blank" rel="noopener">@njtierney</a>, <a href="https://github.com/oduilln" target="_blank" rel="noopener">@oduilln</a>, <a href="https://github.com/papageorgiou" target="_blank" rel="noopener">@papageorgiou</a>, <a href="https://github.com/pat-s" target="_blank" rel="noopener">@pat-s</a>, <a href="https://github.com/paulponcet" target="_blank" rel="noopener">@paulponcet</a>, <a href="https://github.com/petyaracz" target="_blank" rel="noopener">@petyaracz</a>, <a href="https://github.com/phargarten2" target="_blank" rel="noopener">@phargarten2</a>, <a href="https://github.com/philiporlando" target="_blank" rel="noopener">@philiporlando</a>, <a href="https://github.com/q-w-a" target="_blank" rel="noopener">@q-w-a</a>, <a href="https://github.com/QuLogic" target="_blank" rel="noopener">@QuLogic</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/rcorty" target="_blank" rel="noopener">@rcorty</a>, <a href="https://github.com/reisner" target="_blank" rel="noopener">@reisner</a>, <a href="https://github.com/Rekyt" target="_blank" rel="noopener">@Rekyt</a>, <a href="https://github.com/roboes" target="_blank" rel="noopener">@roboes</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/rorynolan" target="_blank" rel="noopener">@rorynolan</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/sar8421" target="_blank" rel="noopener">@sar8421</a>, <a href="https://github.com/ScoobyQ" target="_blank" rel="noopener">@ScoobyQ</a>, <a href="https://github.com/sda030" target="_blank" rel="noopener">@sda030</a>, <a href="https://github.com/sgschreiber" target="_blank" rel="noopener">@sgschreiber</a>, <a href="https://github.com/sheffe" target="_blank" rel="noopener">@sheffe</a>, <a href="https://github.com/Shians" target="_blank" rel="noopener">@Shians</a>, <a href="https://github.com/ShixiangWang" target="_blank" rel="noopener">@ShixiangWang</a>, <a href="https://github.com/shosaco" target="_blank" rel="noopener">@shosaco</a>, <a href="https://github.com/siavash-babaei" target="_blank" rel="noopener">@siavash-babaei</a>, <a href="https://github.com/stephenashton-dhsc" target="_blank" rel="noopener">@stephenashton-dhsc</a>, <a href="https://github.com/stschiff" target="_blank" rel="noopener">@stschiff</a>, <a href="https://github.com/surdina" target="_blank" rel="noopener">@surdina</a>, <a href="https://github.com/tdawry" target="_blank" rel="noopener">@tdawry</a>, <a href="https://github.com/thebioengineer" target="_blank" rel="noopener">@thebioengineer</a>, <a href="https://github.com/TimTaylor" target="_blank" rel="noopener">@TimTaylor</a>, <a href="https://github.com/TimTeaFan" target="_blank" rel="noopener">@TimTeaFan</a>, <a href="https://github.com/tomjemmett" target="_blank" rel="noopener">@tomjemmett</a>, <a href="https://github.com/torbjorn" target="_blank" rel="noopener">@torbjorn</a>, <a href="https://github.com/tvatter" target="_blank" rel="noopener">@tvatter</a>, <a href="https://github.com/TylerGrantSmith" target="_blank" rel="noopener">@TylerGrantSmith</a>, <a href="https://github.com/vorpalvorpal" target="_blank" rel="noopener">@vorpalvorpal</a>, <a href="https://github.com/vspinu" target="_blank" rel="noopener">@vspinu</a>, <a href="https://github.com/wch" target="_blank" rel="noopener">@wch</a>, <a href="https://github.com/werkstattcodes" target="_blank" rel="noopener">@werkstattcodes</a>, <a href="https://github.com/williamlai2" target="_blank" rel="noopener">@williamlai2</a>, <a href="https://github.com/yogat3ch" target="_blank" rel="noopener">@yogat3ch</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>But if they used the tidyverse coercion rules. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> tidyclust is on CRAN https://www.tidyverse.org/blog/2022/12/tidyclust-0-1-0/ Tue, 06 Dec 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/12/tidyclust-0-1-0/ <!-- TODO: * [X] Look over / edit the post's title in the yaml * [X] Edit (or delete) the description; note this appears in the Twitter card * [X] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [X] Find photo & update yaml metadata * [X] Create `thumbnail-sq.jpg`; height and width should be equal * [X] Create `thumbnail-wd.jpg`; width should be >5x height * [X] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [X] Add intro sentence, e.g. the standard tagline for the package * [X] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re very pleased to announce the release of <a href="https://tidyclust.tidymodels.org/" target="_blank" rel="noopener">tidyclust</a> 0.1.0. tidyclust is the tidymodels extension for working with clustering models. This package wouldn&rsquo;t have been possible without the great work of <a href="https://twitter.com/KellyBodwin" target="_blank" rel="noopener">Kelly Bodwin</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidyclust"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will introduce tidyclust, how to use it with the rest of tidymodels, and how we can interact and evaluate the fitted clustering models.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span> </span> <span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching packages</span> ────────────────────────────────────── tidymodels 1.0.0 ──</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>broom </span> 1.0.1 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>recipes </span> 1.0.3 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dials </span> 1.1.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>rsample </span> 1.1.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.0.10 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.1.8 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.4.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.2.1 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>infer </span> 1.0.4 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tune </span> 1.0.1 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>modeldata </span> 1.0.1 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflows </span> 1.1.2 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>parsnip </span> 1.0.3 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflowsets</span> 1.0.0 </span></span> <span><span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 0.3.5 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>yardstick </span> 1.1.0</span></span> <span></span><span><span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>purrr</span>::<span style='color: #00BB00;'>discard()</span> masks <span style='color: #0000BB;'>scales</span>::discard()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>recipes</span>::<span style='color: #00BB00;'>step()</span> masks <span style='color: #0000BB;'>stats</span>::step()</span></span> <span><span class='c'>#&gt; <span style='color: #0000BB;'>•</span> Use suppressPackageStartupMessages() to eliminate package startup messages</span></span> <span></span><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/tidyclust'>tidyclust</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="specifying-clustering-models">Specifying clustering models <a href="#specifying-clustering-models"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The first thing we need to do is decide on the type of clustering model we want to fit. The pkgdown site provides a <a href="https://tidyclust.tidymodels.org/reference/index.html#specifications" target="_blank" rel="noopener">list of all clustering specifications</a> provided by tidyclust. We are slowly adding more types of models&mdash; <a href="https://github.com/tidymodels/tidyclust/issues" target="_blank" rel="noopener">suggestions in issues</a> are highly welcome!</p> <p>We will use a K-Means model for these examples using <a href="https://rdrr.io/pkg/tidyclust/man/k_means.html" target="_blank" rel="noopener"><code>k_means()</code></a> to create a specification. As with other packages in the tidymodels, tidyclust tries to make use of informative names for functions and arguments; as such, the argument denoting the number of clusters is <code>num_clusters</code> rather than <code>k</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/pkg/tidyclust/man/k_means.html'>k_means</a></span><span class='o'>(</span>num_clusters <span class='o'>=</span> <span class='m'>4</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"ClusterR"</span><span class='o'>)</span></span> <span><span class='nv'>kmeans_spec</span></span> <span><span class='c'>#&gt; K Means Cluster Specification (partition)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Main Arguments:</span></span> <span><span class='c'>#&gt; num_clusters = 4</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Computational engine: ClusterR</span></span> <span></span></code></pre> </div> <p>We can use the <a href="https://parsnip.tidymodels.org/reference/set_engine.html" target="_blank" rel="noopener"><code>set_engine()</code></a>, <a href="https://parsnip.tidymodels.org/reference/set_args.html" target="_blank" rel="noopener"><code>set_mode()</code></a>, and <a href="https://parsnip.tidymodels.org/reference/set_args.html" target="_blank" rel="noopener"><code>set_args()</code></a> functions we are familiar with from parsnip. The specification itself isn&rsquo;t worth much if we don&rsquo;t apply it to some data. We will use the ames data set from the modeldata package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"ames"</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span></span></code></pre> </div> <p>This data set contains a number of categorical variables that unaltered can&rsquo;t be used with a K-Means model. Some light preprocessing can be done using the recipes package.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>rec_spec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>step_dummy</span><span class='o'>(</span><span class='nf'>all_nominal_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>step_zv</span><span class='o'>(</span><span class='nf'>all_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'>step_pca</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span>, threshold <span class='o'>=</span> <span class='m'>0.8</span><span class='o'>)</span></span></code></pre> </div> <p>This recipe normalizes all of the numeric variables before applying PCA to create a more minimal set of uncorrelated features. Notice how we didn&rsquo;t specify an outcome as clustering models are unsupervised, meaning that we don&rsquo;t have outcomes.</p> <p>These two specifications can be combined in a <code>workflow()</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_wf</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>kmeans_spec</span><span class='o'>)</span></span></code></pre> </div> <p>This workflow can then be fit to the <code>ames</code> data set.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_fit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span><span class='nv'>kmeans_wf</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span></span> <span><span class='nv'>kmeans_fit</span></span> <span><span class='c'>#&gt; ══ Workflow [trained] ══════════════════════════════════════════════════════════</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Preprocessor:</span> Recipe</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Model:</span> k_means()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; 4 Recipe Steps</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; • step_dummy()</span></span> <span><span class='c'>#&gt; • step_zv()</span></span> <span><span class='c'>#&gt; • step_normalize()</span></span> <span><span class='c'>#&gt; • step_pca()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; KMeans Cluster</span></span> <span><span class='c'>#&gt; Call: ClusterR::KMeans_rcpp(data = data, clusters = clusters) </span></span> <span><span class='c'>#&gt; Data cols: 121 </span></span> <span><span class='c'>#&gt; Centroids: 4 </span></span> <span><span class='c'>#&gt; BSS/SS: 0.1003306 </span></span> <span><span class='c'>#&gt; SS: 646321.6 = 581475.8 (WSS) + 64845.81 (BSS)</span></span> <span></span></code></pre> </div> <p>We have arbitrarily set the number of clusters to 4 above. If we wanted to figure out what values would be &ldquo;optimal,&rdquo; we would have to fit multiple models. We can do this with <a href="https://rdrr.io/pkg/tidyclust/man/tune_cluster.html" target="_blank" rel="noopener"><code>tune_cluster()</code></a>; to make use of this function, though, we first need to use <a href="https://hardhat.tidymodels.org/reference/tune.html" target="_blank" rel="noopener"><code>tune()</code></a> to specify that <code>num_clusters</code> is the argument we want to try with multiple values.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>kmeans_spec</span> <span class='o'>&lt;-</span> <span class='nv'>kmeans_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_args</a></span><span class='o'>(</span>num_clusters <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>kmeans_wf</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='nv'>rec_spec</span>, <span class='nv'>kmeans_spec</span><span class='o'>)</span></span> <span><span class='nv'>kmeans_wf</span></span> <span><span class='c'>#&gt; ══ Workflow ════════════════════════════════════════════════════════════════════</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Preprocessor:</span> Recipe</span></span> <span><span class='c'>#&gt; <span style='font-style: italic;'>Model:</span> k_means()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; 4 Recipe Steps</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; • step_dummy()</span></span> <span><span class='c'>#&gt; • step_zv()</span></span> <span><span class='c'>#&gt; • step_normalize()</span></span> <span><span class='c'>#&gt; • step_pca()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span></span> <span><span class='c'>#&gt; K Means Cluster Specification (partition)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Main Arguments:</span></span> <span><span class='c'>#&gt; num_clusters = tune()</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; Computational engine: ClusterR</span></span> <span></span></code></pre> </div> <p>We can use <a href="https://rdrr.io/pkg/tidyclust/man/tune_cluster.html" target="_blank" rel="noopener"><code>tune_cluster()</code></a> in the same way we use <code>tune_grid()</code>, using bootstraps to fit multiple models for each value of <code>num_clusters</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1234</span><span class='o'>)</span></span> <span><span class='nv'>boots</span> <span class='o'>&lt;-</span> <span class='nf'>bootstraps</span><span class='o'>(</span><span class='nv'>ames</span>, times <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>tune_res</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/pkg/tidyclust/man/tune_cluster.html'>tune_cluster</a></span><span class='o'>(</span></span> <span> <span class='nv'>kmeans_wf</span>,</span> <span> resamples <span class='o'>=</span> <span class='nv'>boots</span></span> <span><span class='o'>)</span></span></code></pre> </div> <p>The different <a href="https://tune.tidymodels.org/reference/collect_predictions.html" target="_blank" rel="noopener">collect functions</a> such as <code>collect_metrics()</code> works as they would do with tune output.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>collect_metrics</span><span class='o'>(</span><span class='nv'>tune_res</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 18 × 7</span></span></span> <span><span class='c'>#&gt; num_clusters .metric .estimator mean n std_err .config </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 6 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 6 sse_within_total standard <span style='text-decoration: underline;'>557</span>147. 10 <span style='text-decoration: underline;'>2</span>579. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 1 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 1 sse_within_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 3 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 3 sse_within_total standard <span style='text-decoration: underline;'>588</span>001. 10 <span style='text-decoration: underline;'>5</span>703. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 5 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 5 sse_within_total standard <span style='text-decoration: underline;'>568</span>085. 10 <span style='text-decoration: underline;'>3</span>821. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 9 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 9 sse_within_total standard <span style='text-decoration: underline;'>535</span>120. 10 <span style='text-decoration: underline;'>2</span>262. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>11</span> 2 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>12</span> 2 sse_within_total standard <span style='text-decoration: underline;'>599</span>762. 10 <span style='text-decoration: underline;'>4</span>306. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>13</span> 8 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>14</span> 8 sse_within_total standard <span style='text-decoration: underline;'>541</span>813. 10 <span style='text-decoration: underline;'>2</span>506. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>15</span> 4 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>16</span> 4 sse_within_total standard <span style='text-decoration: underline;'>583</span>604. 10 <span style='text-decoration: underline;'>5</span>523. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>17</span> 7 sse_total standard <span style='text-decoration: underline;'>624</span>435. 10 <span style='text-decoration: underline;'>1</span>675. Preprocessor1…</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>18</span> 7 sse_within_total standard <span style='text-decoration: underline;'>548</span>299. 10 <span style='text-decoration: underline;'>2</span>907. Preprocessor1…</span></span> <span></span></code></pre> </div> <h2 id="extraction">Extraction <a href="#extraction"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Going back to the first model we fit, tidyclust provides three main tools for interfacing with a fitted cluster model:</p> <ul> <li>extract cluster assignments</li> <li>extract centroid locations</li> <li>prediction with new data</li> </ul> <p>Each of these tasks has a function associated with them. First, we have <a href="https://rdrr.io/pkg/tidyclust/man/extract_cluster_assignment.html" target="_blank" rel="noopener"><code>extract_cluster_assignment()</code></a>, which can be used on fitted tidyclust objects, alone or as a part of a workflow, and it returns the cluster assignment as a factor named <code>.cluster</code> in a tibble.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/pkg/tidyclust/man/extract_cluster_assignment.html'>extract_cluster_assignment</a></span><span class='o'>(</span><span class='nv'>kmeans_fit</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,930 × 1</span></span></span> <span><span class='c'>#&gt; .cluster </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Cluster_1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Cluster_1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Cluster_1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Cluster_1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Cluster_2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,920 more rows</span></span></span> <span></span></code></pre> </div> <p>The location of the clusters can be found using <a href="https://rdrr.io/pkg/tidyclust/man/extract_centroids.html" target="_blank" rel="noopener"><code>extract_centroids()</code></a> which again returns a tibble, with <code>.cluster</code> being a factor with the same levels as what we got from <a href="https://rdrr.io/pkg/tidyclust/man/extract_cluster_assignment.html" target="_blank" rel="noopener"><code>extract_cluster_assignment()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/pkg/tidyclust/man/extract_centroids.html'>extract_centroids</a></span><span class='o'>(</span><span class='nv'>kmeans_fit</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 122</span></span></span> <span><span class='c'>#&gt; .cluster PC001 PC002 PC003 PC004 PC005 PC006 PC007 PC008 PC009</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Cluster_1 -<span style='color: #BB0000;'>5.76</span> 0.713 11.9 2.80 4.09 3.44 1.26 -<span style='color: #BB0000;'>0.280</span> -<span style='color: #BB0000;'>0.486</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> Cluster_2 3.98 -<span style='color: #BB0000;'>1.18</span> 0.126 0.718 0.150 0.055<span style='text-decoration: underline;'>4</span> -<span style='color: #BB0000;'>0.046</span><span style='color: #BB0000; text-decoration: underline;'>0</span> -<span style='color: #BB0000;'>0.346</span> 0.059<span style='text-decoration: underline;'>9</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> Cluster_3 -<span style='color: #BB0000;'>0.970</span> 2.45 -<span style='color: #BB0000;'>0.604</span> -<span style='color: #BB0000;'>0.523</span> 0.302 -<span style='color: #BB0000;'>0.298</span> -<span style='color: #BB0000;'>0.174</span> 0.507 -<span style='color: #BB0000;'>0.153</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> Cluster_4 -<span style='color: #BB0000;'>4.40</span> -<span style='color: #BB0000;'>2.30</span> -<span style='color: #BB0000;'>0.658</span> -<span style='color: #BB0000;'>0.671</span> -<span style='color: #BB0000;'>1.29</span> -<span style='color: #BB0000;'>0.007</span><span style='color: #BB0000; text-decoration: underline;'>51</span> 0.222 -<span style='color: #BB0000;'>0.250</span> 0.223 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 112 more variables: PC010 &lt;dbl&gt;, PC011 &lt;dbl&gt;, PC012 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC013 &lt;dbl&gt;, PC014 &lt;dbl&gt;, PC015 &lt;dbl&gt;, PC016 &lt;dbl&gt;, PC017 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC018 &lt;dbl&gt;, PC019 &lt;dbl&gt;, PC020 &lt;dbl&gt;, PC021 &lt;dbl&gt;, PC022 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC023 &lt;dbl&gt;, PC024 &lt;dbl&gt;, PC025 &lt;dbl&gt;, PC026 &lt;dbl&gt;, PC027 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC028 &lt;dbl&gt;, PC029 &lt;dbl&gt;, PC030 &lt;dbl&gt;, PC031 &lt;dbl&gt;, PC032 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC033 &lt;dbl&gt;, PC034 &lt;dbl&gt;, PC035 &lt;dbl&gt;, PC036 &lt;dbl&gt;, PC037 &lt;dbl&gt;,</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># PC038 &lt;dbl&gt;, PC039 &lt;dbl&gt;, PC040 &lt;dbl&gt;, PC041 &lt;dbl&gt;, PC042 &lt;dbl&gt;, …</span></span></span> <span></span></code></pre> </div> <p>Lastly, if the model has a notion that translates to &ldquo;prediction,&rdquo; then <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a> will give you those results as well. In the case of K-Means, this is being interpreted as &ldquo;which centroid is this observation closest to.&rdquo;</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>kmeans_fit</span>, new_data <span class='o'>=</span> <span class='nf'>slice_sample</span><span class='o'>(</span><span class='nv'>ames</span>, n <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10 × 1</span></span></span> <span><span class='c'>#&gt; .pred_cluster</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Cluster_4 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Cluster_2 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Cluster_4 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Cluster_3 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Cluster_1 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Cluster_4 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Cluster_2 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Cluster_2 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Cluster_1 </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Cluster_4</span></span> <span></span></code></pre> </div> <p>Please check the <a href="https://tidyclust.tidymodels.org/" target="_blank" rel="noopener">pkgdown site</a> for more in-depth articles. We couldn&rsquo;t be happier to have this package on CRAN and we encouraging you to check it out.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all the contributors: <a href="https://github.com/aephidayatuloh" target="_blank" rel="noopener">@aephidayatuloh</a>, <a href="https://github.com/avishaitsur" target="_blank" rel="noopener">@avishaitsur</a>, <a href="https://github.com/bryanosborne" target="_blank" rel="noopener">@bryanosborne</a>, <a href="https://github.com/cgoo4" target="_blank" rel="noopener">@cgoo4</a>, <a href="https://github.com/coforfe" target="_blank" rel="noopener">@coforfe</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/JauntyJJS" target="_blank" rel="noopener">@JauntyJJS</a>, <a href="https://github.com/kbodwin" target="_blank" rel="noopener">@kbodwin</a>, <a href="https://github.com/malcolmbarrett" target="_blank" rel="noopener">@malcolmbarrett</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/ninohardt" target="_blank" rel="noopener">@ninohardt</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, and <a href="https://github.com/tomazweiss" target="_blank" rel="noopener">@tomazweiss</a>.</p> stringr 1.5.0 https://www.tidyverse.org/blog/2022/12/stringr-1-5-0/ Mon, 05 Dec 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/12/stringr-1-5-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) * [ ] Update release link --> <p>We&rsquo;re chuffed to announce the release of <a href="https://stringr.tidyverse.org" target="_blank" rel="noopener">stringr</a> 1.5.0. stringr provides a cohesive set of functions designed to make working with strings as easy as possible.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"stringr"</span><span class='o'>)</span></span></code></pre> </div> <p>This blog post will give you an overview of the biggest changes (you can get a detailed list of all changes from the <a href="https://stringr.tidyverse.org/news/index.html" target="_blank" rel="noopener">release notes</a>). Firstly, we need to update you on some (small) breaking changes we&rsquo;ve made to make stringr more consistent with the rest of the tidyverse. Then, we&rsquo;ll give a quick overview of improvements to documentation and stringr&rsquo;s new license. Lastly, we&rsquo;ll finish off by diving into a few of the many small, but useful, functions that we&rsquo;ve accumulated in the three and half years since the last release.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://stringr.tidyverse.org'>stringr</a></span><span class='o'>)</span></span></code></pre> </div> <h2 id="breaking-changes">Breaking changes <a href="#breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Lets start with the important stuff: the breaking changes. We&rsquo;ve tried to keep these small and we don&rsquo;t believe they&rsquo;ll affect much code in the wild (they only affected ~20 of the ~1,600 packages that use stringr). But we&rsquo;re believe they&rsquo;re important to make as a consistent set of rules makes the tidyverse as a whole more predictable and easier to learn.</p> <h3 id="recycling-rules">Recycling rules <a href="#recycling-rules"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>stringr functions now consistently implement the tidyverse recycling rules<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>, which are stricter than the previous rules in two ways. Firstly, we no longer recycle shorter vectors that are an integer multiple of longer vectors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `str_detect()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't recycle `string` (size 26) to match `pattern` (size 2).</span></span> <span></span></code></pre> </div> <p>Secondly, a 0-length vector no longer implies a 0-length output. Instead it&rsquo;s recycled using the usual rules:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `str_detect()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't recycle `string` (size 26) to match `pattern` (size 0).</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='nf'><a href='https://rdrr.io/r/base/character.html'>character</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; logical(0)</span></span> <span></span></code></pre> </div> <p>Neither of these situations occurs very commonly in data analysis, so this change primarily brings consistency with the rest of the tidyverse without affecting much existing code.</p> <p>Finally, stringr functions are generally a little stricter because we require the inputs to be vectors of some type. Again, this is unlikely to affect your data analysis code and will result in a clearer error if you accidentally pass in something weird:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='nv'>mean</span>, <span class='s'>"x"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `str_detect()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `string` must be a vector, not a function.</span></span> <span></span></code></pre> </div> <h3 id="empty-patterns">Empty patterns <a href="#empty-patterns"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>In many stringr functions, <code>&quot;&quot;</code> will match or split on every character. This is motivated by base R&rsquo;s <a href="https://rdrr.io/r/base/strsplit.html" target="_blank" rel="noopener"><code>strsplit()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/strsplit.html'>strsplit</a></span><span class='o'>(</span><span class='s'>"abc"</span>, <span class='s'>""</span><span class='o'>)</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='c'>#&gt; [1] "a" "b" "c"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split</a></span><span class='o'>(</span><span class='s'>"abc"</span>, <span class='s'>""</span><span class='o'>)</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span></span> <span><span class='c'>#&gt; [1] "a" "b" "c"</span></span> <span></span></code></pre> </div> <p>When creating stringr (over 13 years ago!), I took this idea and ran with it, implementing similar support in every function where it might possibly work. But I missed an important problem with <a href="https://stringr.tidyverse.org/reference/str_detect.html" target="_blank" rel="noopener"><code>str_detect()</code></a>.</p> <p>What should <code>str_detect(X, &quot;&quot;)</code> return? You can argue two ways:</p> <ul> <li>To be consistent with <a href="https://stringr.tidyverse.org/reference/str_split.html" target="_blank" rel="noopener"><code>str_split()</code></a>, it should return <code>TRUE</code> whenever there are characters to match, i.e. <code>x != &quot;&quot;</code>.</li> <li>It&rsquo;s common to build up a set of possible matches by doing <code>str_flatten(matches, &quot;|&quot;)</code>. What should this match if <code>matches</code> is empty? Ideally it would match nothing implying that <code>str_detect(x, &quot;&quot;)</code> should be equivalent to <code>x == &quot;&quot;</code>.</li> </ul> <p>This inconsistency potentially leads to some subtle bugs, so use of <code>&quot;&quot;</code> in <a href="https://stringr.tidyverse.org/reference/str_detect.html" target="_blank" rel="noopener"><code>str_detect()</code></a> (and a few other related functions) is now an error:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_detect.html'>str_detect</a></span><span class='o'>(</span><span class='nv'>letters</span>, <span class='s'>""</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `str_detect()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `pattern` can't be the empty string (`""`).</span></span> <span></span></code></pre> </div> <h2 id="documentation-and-licensing">Documentation and licensing <a href="#documentation-and-licensing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Now that we&rsquo;ve got the breaking changes out of the way we can focus on the new stuff 😃. Most importantly, there&rsquo;s a new vignette that provides some advice if you&rsquo;re transition from (or to) base R&rsquo;s string functions: <a href="https://stringr.tidyverse.org/articles/from-base.html" target="_blank" rel="noopener"><code>vignette(&quot;from-base&quot;, package = &quot;stringr&quot;)</code></a>. It was written by <a href="https://sastoudt.github.io" target="_blank" rel="noopener">Sara Stoudt</a> during the 2019 Tidyverse developer day, and has finally made it to the released version!</p> <p>We&rsquo;ve also spent a bunch of time reviewing the documentation, particularly the topic titles and descriptions. They&rsquo;re now more informative and less duplicative, hopefully make it easier to find the function that you&rsquo;re looking for. See the complete list of functions in the <a href="https://stringr.tidyverse.org/reference/index.html" target="_blank" rel="noopener">reference index</a>.</p> <p>Finally, stringr is now officially <a href="https://www.tidyverse.org/blog/2021/12/relicensing-packages/" target="_blank" rel="noopener">re-licensed as MIT</a>.</p> <h2 id="new-features">New features <a href="#new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The biggest improvement is to <a href="https://stringr.tidyverse.org/reference/str_view.html" target="_blank" rel="noopener"><code>str_view()</code></a> which has gained a bunch of new features, including using the <a href="https://cli.r-lib.org/" target="_blank" rel="noopener">cli</a> package so it can work in more places. We also have a grab bag of new functions that fill in small functionality gaps.</p> <h3 id="str_view"><code>str_view()</code> <a href="#str_view"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://stringr.tidyverse.org/reference/str_view.html" target="_blank" rel="noopener"><code>str_view()</code></a> uses ANSI colouring rather than an HTML widget. This means it works in more places and requires fewer dependencies. <a href="https://stringr.tidyverse.org/reference/str_view.html" target="_blank" rel="noopener"><code>str_view()</code></a> now:</p> <ul> <li> <p>Displays strings with special characters:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"\\"</span>, <span class='s'>"\"\nabcdef\n\""</span><span class='o'>)</span></span> <span><span class='nv'>x</span></span> <span><span class='c'>#&gt; [1] "\\" "\"\nabcdef\n\""</span></span> <span></span><span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_view.html'>str_view</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[1] │</span> \</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[2] │</span> "</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>│</span> abcdef</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>│</span> "</span></span> <span></span></code></pre> </div> </li> <li> <p>Highlights unusual whitespace characters:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_view.html'>str_view</a></span><span class='o'>(</span><span class='s'>"\t"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[1] │</span> <span style='color: #00BBBB;'>&#123;\t&#125;</span></span></span> <span></span></code></pre> </div> </li> <li> <p>By default, only shows matching strings:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_view.html'>str_view</a></span><span class='o'>(</span><span class='nv'>fruit</span>, <span class='s'>"(.)\\1"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [1] │</span> a<span style='color: #00BBBB;'>&lt;pp&gt;</span>le</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [5] │</span> be<span style='color: #00BBBB;'>&lt;ll&gt;</span> pe<span style='color: #00BBBB;'>&lt;pp&gt;</span>er</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [6] │</span> bilbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [7] │</span> blackbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [8] │</span> blackcu<span style='color: #00BBBB;'>&lt;rr&gt;</span>ant</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> [9] │</span> bl<span style='color: #00BBBB;'>&lt;oo&gt;</span>d orange</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[10] │</span> bluebe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[11] │</span> boysenbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[16] │</span> che<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[17] │</span> chili pe<span style='color: #00BBBB;'>&lt;pp&gt;</span>er</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[19] │</span> cloudbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[21] │</span> cranbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[23] │</span> cu<span style='color: #00BBBB;'>&lt;rr&gt;</span>ant</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[28] │</span> e<span style='color: #00BBBB;'>&lt;gg&gt;</span>plant</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[29] │</span> elderbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[32] │</span> goji be<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[33] │</span> g<span style='color: #00BBBB;'>&lt;oo&gt;</span>sebe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[38] │</span> hucklebe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[47] │</span> lych<span style='color: #00BBBB;'>&lt;ee&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>[50] │</span> mulbe<span style='color: #00BBBB;'>&lt;rr&gt;</span>y</span></span> <span><span class='c'>#&gt; ... and 9 more</span></span> <span></span></code></pre> </div> <p>(This makes <a href="https://stringr.tidyverse.org/reference/str_view.html" target="_blank" rel="noopener"><code>str_view_all()</code></a> redundant and hence deprecated.)</p> </li> </ul> <h3 id="comparing-strings">Comparing strings <a href="#comparing-strings"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>There are three new functions related to comparing strings:</p> <ul> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_equal.html" target="_blank" rel="noopener"><code>str_equal()</code></a> compares two character vectors using Unicode rules, optionally ignoring case:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_equal.html'>str_equal</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"A"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] FALSE</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_equal.html'>str_equal</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"A"</span>, ignore_case <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] TRUE</span></span> <span></span></code></pre> </div> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_order.html" target="_blank" rel="noopener"><code>str_rank()</code></a> completes the set of order/rank/sort functions:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_order.html'>str_rank</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"c"</span>, <span class='s'>"b"</span>, <span class='s'>"b"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 4 2 2</span></span> <span></span><span><span class='c'># compare to:</span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_order.html'>str_order</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"c"</span>, <span class='s'>"b"</span>, <span class='s'>"b"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] 1 3 4 2</span></span> <span></span></code></pre> </div> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_unique.html" target="_blank" rel="noopener"><code>str_unique()</code></a> returns unique values, optionally ignoring case:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_unique.html'>str_unique</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"a"</span>, <span class='s'>"A"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "a" "A"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_unique.html'>str_unique</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"a"</span>, <span class='s'>"A"</span><span class='o'>)</span>, ignore_case <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "a"</span></span> <span></span></code></pre> </div> </li> </ul> <h3 id="splitting">Splitting <a href="#splitting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://stringr.tidyverse.org/reference/str_split.html" target="_blank" rel="noopener"><code>str_split()</code></a> gains two useful variants:</p> <ul> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_split.html" target="_blank" rel="noopener"><code>str_split_1()</code></a> is tailored for the special case of splitting up a single string. It returns a character vector, not a list, and errors if you try and give it multiple values:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split_1</a></span><span class='o'>(</span><span class='s'>"x-y-z"</span>, <span class='s'>"-"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "x" "y" "z"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split_1</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x-y"</span>, <span class='s'>"a-b-c"</span><span class='o'>)</span>, <span class='s'>"-"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `str_split_1()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `string` must be a single string, not a character vector.</span></span> <span></span></code></pre> </div> <p>It&rsquo;s a shortcut for the common pattern of <code>unlist(str_split(x, &quot; &quot;))</code>.</p> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_split.html" target="_blank" rel="noopener"><code>str_split_i()</code></a> extracts a single piece from the split string:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a-b-c"</span>, <span class='s'>"d-e"</span>, <span class='s'>"f-g-h-i"</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split_i</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"-"</span>, <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "b" "e" "g"</span></span> <span></span><span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split_i</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"-"</span>, <span class='m'>4</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] NA NA "i"</span></span> <span></span><span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_split.html'>str_split_i</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"-"</span>, <span class='o'>-</span><span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "c" "e" "i"</span></span> <span></span></code></pre> </div> </li> </ul> <h3 id="miscellaneous">Miscellaneous <a href="#miscellaneous"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><ul> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_escape.html" target="_blank" rel="noopener"><code>str_escape()</code></a> escapes regular expression metacharacters, providing an alternative to <a href="https://stringr.tidyverse.org/reference/modifiers.html" target="_blank" rel="noopener"><code>fixed()</code></a> if you want to compose a pattern from user supplied strings:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_view.html'>str_view</a></span><span class='o'>(</span><span class='s'>"[hello]"</span>, <span class='nf'><a href='https://stringr.tidyverse.org/reference/str_escape.html'>str_escape</a></span><span class='o'>(</span><span class='s'>"[]"</span><span class='o'>)</span><span class='o'>)</span></span></code></pre> </div> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_extract.html" target="_blank" rel="noopener"><code>str_extract()</code></a> can now extract a capturing group instead of the complete match:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Chapter 1"</span>, <span class='s'>"Section 2.3"</span>, <span class='s'>"Chapter 3"</span>, <span class='s'>"Section 4.1.1"</span><span class='o'>)</span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_extract.html'>str_extract</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"([A-Za-z]+) ([0-9.]+)"</span>, group <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "Chapter" "Section" "Chapter" "Section"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_extract.html'>str_extract</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"([A-Za-z]+) ([0-9.]+)"</span>, group <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "1" "2.3" "3" "4.1.1"</span></span> <span></span></code></pre> </div> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_flatten.html" target="_blank" rel="noopener"><code>str_flatten()</code></a> gains a <code>last</code> argument which is used to power the new <a href="https://stringr.tidyverse.org/reference/str_flatten.html" target="_blank" rel="noopener"><code>str_flatten_comma()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten_comma</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cats"</span>, <span class='s'>"dogs"</span>, <span class='s'>"mice"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "cats, dogs, mice"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten_comma</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cats"</span>, <span class='s'>"dogs"</span>, <span class='s'>"mice"</span><span class='o'>)</span>, last <span class='o'>=</span> <span class='s'>" and "</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "cats, dogs and mice"</span></span> <span></span><span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten_comma</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cats"</span>, <span class='s'>"dogs"</span>, <span class='s'>"mice"</span><span class='o'>)</span>, last <span class='o'>=</span> <span class='s'>", and "</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "cats, dogs, and mice"</span></span> <span></span><span></span> <span><span class='c'># correctly handles the two element case with the Oxford comma</span></span> <span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_flatten.html'>str_flatten_comma</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cats"</span>, <span class='s'>"dogs"</span><span class='o'>)</span>, last <span class='o'>=</span> <span class='s'>", and "</span><span class='o'>)</span></span> <span><span class='c'>#&gt; [1] "cats and dogs"</span></span> <span></span></code></pre> </div> </li> <li> <p> <a href="https://stringr.tidyverse.org/reference/str_like.html" target="_blank" rel="noopener"><code>str_like()</code></a> works like <a href="https://stringr.tidyverse.org/reference/str_detect.html" target="_blank" rel="noopener"><code>str_detect()</code></a> but uses SQL&rsquo;s LIKE syntax:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>fruit</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"apple"</span>, <span class='s'>"banana"</span>, <span class='s'>"pear"</span>, <span class='s'>"pineapple"</span><span class='o'>)</span></span> <span><span class='nv'>fruit</span><span class='o'>[</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_like.html'>str_like</a></span><span class='o'>(</span><span class='nv'>fruit</span>, <span class='s'>"%apple"</span><span class='o'>)</span><span class='o'>]</span></span> <span><span class='c'>#&gt; [1] "apple" "pineapple"</span></span> <span></span><span><span class='nv'>fruit</span><span class='o'>[</span><span class='nf'><a href='https://stringr.tidyverse.org/reference/str_like.html'>str_like</a></span><span class='o'>(</span><span class='nv'>fruit</span>, <span class='s'>"p__r"</span><span class='o'>)</span><span class='o'>]</span></span> <span><span class='c'>#&gt; [1] "pear"</span></span> <span></span></code></pre> </div> </li> </ul> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 114 folks who contributed to this release through pull requests and issues! <a href="https://github.com/aaronrudkin" target="_blank" rel="noopener">@aaronrudkin</a>, <a href="https://github.com/adisarid" target="_blank" rel="noopener">@adisarid</a>, <a href="https://github.com/AleSR13" target="_blank" rel="noopener">@AleSR13</a>, <a href="https://github.com/anfederico" target="_blank" rel="noopener">@anfederico</a>, <a href="https://github.com/AR1337" target="_blank" rel="noopener">@AR1337</a>, <a href="https://github.com/arisp99" target="_blank" rel="noopener">@arisp99</a>, <a href="https://github.com/avila" target="_blank" rel="noopener">@avila</a>, <a href="https://github.com/balthasars" target="_blank" rel="noopener">@balthasars</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/bbarros50" target="_blank" rel="noopener">@bbarros50</a>, <a href="https://github.com/bbo2adwuff" target="_blank" rel="noopener">@bbo2adwuff</a>, <a href="https://github.com/bensenmansen" target="_blank" rel="noopener">@bensenmansen</a>, <a href="https://github.com/bfgray3" target="_blank" rel="noopener">@bfgray3</a>, <a href="https://github.com/Bisaloo" target="_blank" rel="noopener">@Bisaloo</a>, <a href="https://github.com/bonmac" target="_blank" rel="noopener">@bonmac</a>, <a href="https://github.com/botan" target="_blank" rel="noopener">@botan</a>, <a href="https://github.com/bshor" target="_blank" rel="noopener">@bshor</a>, <a href="https://github.com/carlganz" target="_blank" rel="noopener">@carlganz</a>, <a href="https://github.com/chintanp" target="_blank" rel="noopener">@chintanp</a>, <a href="https://github.com/chrimaho" target="_blank" rel="noopener">@chrimaho</a>, <a href="https://github.com/chris2b5" target="_blank" rel="noopener">@chris2b5</a>, <a href="https://github.com/clemenshug" target="_blank" rel="noopener">@clemenshug</a>, <a href="https://github.com/courtiol" target="_blank" rel="noopener">@courtiol</a>, <a href="https://github.com/dachosen1" target="_blank" rel="noopener">@dachosen1</a>, <a href="https://github.com/dan-reznik" target="_blank" rel="noopener">@dan-reznik</a>, <a href="https://github.com/datawookie" target="_blank" rel="noopener">@datawookie</a>, <a href="https://github.com/david-romano" target="_blank" rel="noopener">@david-romano</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dbarrows" target="_blank" rel="noopener">@dbarrows</a>, <a href="https://github.com/deann88" target="_blank" rel="noopener">@deann88</a>, <a href="https://github.com/denrou" target="_blank" rel="noopener">@denrou</a>, <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/dsg38" target="_blank" rel="noopener">@dsg38</a>, <a href="https://github.com/dtburk" target="_blank" rel="noopener">@dtburk</a>, <a href="https://github.com/elbersb" target="_blank" rel="noopener">@elbersb</a>, <a href="https://github.com/geotheory" target="_blank" rel="noopener">@geotheory</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/GrimTrigger88" target="_blank" rel="noopener">@GrimTrigger88</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/iago-pssjd" target="_blank" rel="noopener">@iago-pssjd</a>, <a href="https://github.com/IndigoJay" target="_blank" rel="noopener">@IndigoJay</a>, <a href="https://github.com/jashapiro" target="_blank" rel="noopener">@jashapiro</a>, <a href="https://github.com/JBGruber" target="_blank" rel="noopener">@JBGruber</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/jjesusfilho" target="_blank" rel="noopener">@jjesusfilho</a>, <a href="https://github.com/jmbarbone" target="_blank" rel="noopener">@jmbarbone</a>, <a href="https://github.com/joethorley" target="_blank" rel="noopener">@joethorley</a>, <a href="https://github.com/jonas-hag" target="_blank" rel="noopener">@jonas-hag</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/joshyam-k" target="_blank" rel="noopener">@joshyam-k</a>, <a href="https://github.com/jpeacock29" target="_blank" rel="noopener">@jpeacock29</a>, <a href="https://github.com/jzadra" target="_blank" rel="noopener">@jzadra</a>, <a href="https://github.com/KasperThystrup" target="_blank" rel="noopener">@KasperThystrup</a>, <a href="https://github.com/kendonB" target="_blank" rel="noopener">@kendonB</a>, <a href="https://github.com/kieran-mace" target="_blank" rel="noopener">@kieran-mace</a>, <a href="https://github.com/kiernann" target="_blank" rel="noopener">@kiernann</a>, <a href="https://github.com/Kodiologist" target="_blank" rel="noopener">@Kodiologist</a>, <a href="https://github.com/leej3" target="_blank" rel="noopener">@leej3</a>, <a href="https://github.com/leowill01" target="_blank" rel="noopener">@leowill01</a>, <a href="https://github.com/LimaRAF" target="_blank" rel="noopener">@LimaRAF</a>, <a href="https://github.com/lmwang9527" target="_blank" rel="noopener">@lmwang9527</a>, <a href="https://github.com/Ludsfer" target="_blank" rel="noopener">@Ludsfer</a>, <a href="https://github.com/lz01" target="_blank" rel="noopener">@lz01</a>, <a href="https://github.com/Marcade80" target="_blank" rel="noopener">@Marcade80</a>, <a href="https://github.com/Mashin6" target="_blank" rel="noopener">@Mashin6</a>, <a href="https://github.com/MattCowgill" target="_blank" rel="noopener">@MattCowgill</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/michaelweylandt" target="_blank" rel="noopener">@michaelweylandt</a>, <a href="https://github.com/mikeaalv" target="_blank" rel="noopener">@mikeaalv</a>, <a href="https://github.com/misea" target="_blank" rel="noopener">@misea</a>, <a href="https://github.com/mitchelloharawild" target="_blank" rel="noopener">@mitchelloharawild</a>, <a href="https://github.com/mkvasnicka" target="_blank" rel="noopener">@mkvasnicka</a>, <a href="https://github.com/mrcaseb" target="_blank" rel="noopener">@mrcaseb</a>, <a href="https://github.com/mtnbikerjoshua" target="_blank" rel="noopener">@mtnbikerjoshua</a>, <a href="https://github.com/mwip" target="_blank" rel="noopener">@mwip</a>, <a href="https://github.com/nachovoss" target="_blank" rel="noopener">@nachovoss</a>, <a href="https://github.com/neonira" target="_blank" rel="noopener">@neonira</a>, <a href="https://github.com/Nischal-Karki-ATW" target="_blank" rel="noopener">@Nischal-Karki-ATW</a>, <a href="https://github.com/oliverbeagley" target="_blank" rel="noopener">@oliverbeagley</a>, <a href="https://github.com/orgadish" target="_blank" rel="noopener">@orgadish</a>, <a href="https://github.com/pachadotdev" target="_blank" rel="noopener">@pachadotdev</a>, <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>, <a href="https://github.com/pdelboca" target="_blank" rel="noopener">@pdelboca</a>, <a href="https://github.com/petermeissner" target="_blank" rel="noopener">@petermeissner</a>, <a href="https://github.com/phargarten2" target="_blank" rel="noopener">@phargarten2</a>, <a href="https://github.com/programLyrique" target="_blank" rel="noopener">@programLyrique</a>, <a href="https://github.com/psads-git" target="_blank" rel="noopener">@psads-git</a>, <a href="https://github.com/psychelzh" target="_blank" rel="noopener">@psychelzh</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/richardjtelford" target="_blank" rel="noopener">@richardjtelford</a>, <a href="https://github.com/richelbilderbeek" target="_blank" rel="noopener">@richelbilderbeek</a>, <a href="https://github.com/rjpat" target="_blank" rel="noopener">@rjpat</a>, <a href="https://github.com/romatik" target="_blank" rel="noopener">@romatik</a>, <a href="https://github.com/rressler" target="_blank" rel="noopener">@rressler</a>, <a href="https://github.com/rwbaer" target="_blank" rel="noopener">@rwbaer</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/sammo3182" target="_blank" rel="noopener">@sammo3182</a>, <a href="https://github.com/sastoudt" target="_blank" rel="noopener">@sastoudt</a>, <a href="https://github.com/SchmidtPaul" target="_blank" rel="noopener">@SchmidtPaul</a>, <a href="https://github.com/seasmith" target="_blank" rel="noopener">@seasmith</a>, <a href="https://github.com/selesnow" target="_blank" rel="noopener">@selesnow</a>, <a href="https://github.com/slee981" target="_blank" rel="noopener">@slee981</a>, <a href="https://github.com/Tal1987" target="_blank" rel="noopener">@Tal1987</a>, <a href="https://github.com/tanzatanza" target="_blank" rel="noopener">@tanzatanza</a>, <a href="https://github.com/THChan11" target="_blank" rel="noopener">@THChan11</a>, <a href="https://github.com/travis-leith" target="_blank" rel="noopener">@travis-leith</a>, <a href="https://github.com/vladtarko" target="_blank" rel="noopener">@vladtarko</a>, <a href="https://github.com/wdenton" target="_blank" rel="noopener">@wdenton</a>, <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>, <a href="https://github.com/Yingjie4Science" target="_blank" rel="noopener">@Yingjie4Science</a>, and <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>You might wonder why we developed our own set of recycling rules for the tidyverse instead of using the base R rules. That&rsquo;s because, unfortunately, there isn&rsquo;t a consistent set of rules used by base R, but a <a href="https://vctrs.r-lib.org/articles/type-size.html#appendix-recycling-in-base-r" target="_blank" rel="noopener">suite of variations</a>. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Model Calibration https://www.tidyverse.org/blog/2022/11/model-calibration/ Tue, 29 Nov 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/11/model-calibration/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>I am very excited to introduce work currently underway on the probably package.<br> We are looking to create early awareness and receive feedback from the community. That is why the enhancements discussed here are not yet on CRAN.</p> <p>While the article is meant to introduce new package functionality, we also have the goal of introducing model calibration conceptually. We want to provide sufficient background for those who may not be familiar with model calibration. If you are already familiar with this technique, feel free to skip to the <a href="#example-data">Setup</a> section to get started.</p> <p>To install the version of probably used here:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>remotes</span><span class='nf'>::</span><span class='nf'><a href='https://remotes.r-lib.org/reference/install_github.html'>install_github</a></span><span class='o'>(</span><span class='s'>"tidymodels/probably"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="model-calibration">Model Calibration <a href="#model-calibration"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><em>The goal of model calibration is to ensure that the estimated class probabilities are consistent with what would naturally occur.</em> If a model has poor calibration, we might be able to post-process the original predictions to coerce them to have better properties.</p> <p>There are two main components to model calibration:</p> <ul> <li><strong>Diagnosis</strong> - Figuring out how well the original (and re-calibrated) probabilities perform.</li> <li><strong>Remediation</strong> - Adjusting the original values to have better properties.</li> </ul> <h3 id="the-development-plan">The Development Plan <a href="#the-development-plan"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As with everything in machine learning, there are several options to consider when calibrating a model. Through the new features in the tidymodels packages, we aspire to make those options as easily accessible as possible.</p> <p>Our plan is to implement model calibration in two phases: the first phase will focus on binary models, and the second phase will focus on multi-class models.</p> <p>The first batch of enhancements are now available in the development version of the probably package. The enhancements are centered around plotting functions meant for <strong>diagnosing</strong> the prediction&rsquo;s performance. These are more commonly known as <strong>calibration plots</strong>.</p> <h2 id="calibration-plots">Calibration Plots <a href="#calibration-plots"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The idea behind a calibration plot is that if we group the predictions based on their probability, then we should see a percentage of events <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> that match such probability.</p> <p>For example, if we collect a group of the predictions whose probabilities are estimated to be about 10%, then we should expect that about 10% of the those in the group to indeed be events. The plots shown below can be used as diagnostics to see if our predictions are consistent with the observed event rates.</p> <h3 id="example-data">Example Data <a href="#example-data"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you would like to follow along, load the probably and dplyr packages into your R session.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/probably/'>probably</a></span><span class='o'>)</span></span></code></pre> </div> <p>The probably package comes with a few data sets. For most of the examples in this post, we will use <code>segment_logistic</code>, an example data set that contains predicted probabilities and classes from a logistic regression model for a binary outcome <code>Class</code>, taking values <code>&quot;good&quot;</code> or <code>&quot;bad&quot;</code>. predictions, and their probabilities. <code>Class</code> contains the outcome of <code>.pred_good</code> contains the probability that the event is &ldquo;good&rdquo;.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1,010 × 3</span></span></span> <span><span class='c'>#&gt; .pred_poor .pred_good Class</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>*</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 0.986 0.014<span style='text-decoration: underline;'>2</span> poor </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 0.897 0.103 poor </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 0.118 0.882 good </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 0.102 0.898 good </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 0.991 0.009<span style='text-decoration: underline;'>14</span> poor </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 0.633 0.367 good </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 0.770 0.230 good </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 0.008<span style='text-decoration: underline;'>42</span> 0.992 good </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 0.995 0.004<span style='text-decoration: underline;'>58</span> poor </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 0.765 0.235 poor </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 1,000 more rows</span></span></span></code></pre> </div> <h3 id="binned-plot">Binned Plot <a href="#binned-plot"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>On smaller data sets, it is challenging to obtain an accurate <em>event rate</em> for a given probability. For example, if there are 5 predictions with about a 50% probability, and 3 of those are events, the plot would show a 60% event rate. This comparison would not be appropriate because there are not enough predictions to determine how close to 50% the model really is.</p> <p>The most common approach is to group the probabilities into bins, or buckets. Usually, the data is split into 10 discrete buckets, from 0 to 1 (0 - 100%). The <em>event rate</em> and the <em>bin midpoint</em> is calculated for each bin.</p> <p>In the probably package, binned calibration plots can be created using <a href="https://probably.tidymodels.org/reference/cal_plot_breaks.html" target="_blank" rel="noopener"><code>cal_plot_breaks()</code></a>. It expects a data set (<code>.data</code>), the un-quoted variable names that contain the events (<code>truth</code>), and the probabilities (<code>estimate</code>). For the example here, we pass the <code>segment_logistic</code> data set, and use <code>Class</code> and <code>.pred_good</code> as the arguments. By default, this function will create a calibration plot with 10 buckets (breaks):</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" alt="A ggplot line plot with predicted probabilities on the x axis and event rates on the y axis, both ranging from 0 to 1. A dashed line lies on the identity line y equals x, and is loosely followed by a solid line that joins a series of dots representing the midpoint for each of 10 bins. Past predicted probabilities of 0.5, the dots consistently lie below the dashed line." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The calibration plot for the ideal model will essentially be perfect incline line that start at (0,0) and ends in (1,1). In the case of this model, we can see that the seventh point has an event rate of 49.1% despite having estimated probabilities ranging from 60% to 70%. This indicates that the model is not creating predictions in this region that are consistent with the data (i.e., it is under-predicting).</p> <p>The number of bins in <a href="https://probably.tidymodels.org/reference/cal_plot_breaks.html" target="_blank" rel="noopener"><code>cal_plot_breaks()</code></a> can be adjusted using <code>num_breaks</code>. Here is an example of what the plot looks like if we reduce the bins from 10, to 5:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, num_breaks <span class='o'>=</span> <span class='m'>5</span> <span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" alt="A calibration like that above, but with half as many bins. In this version of the plot, the solid line is less jagged, though still shows that dots consistently lie below the dashed line beyond a predicted probability of 0.5." width="700px" style="display: block; margin: auto;" /></p> </div> <p>The number of breaks should be based on ensuring that there is enough data in each bin to adequately estimate the observed event rate. If your data are small, the next version of the calibration plot might be a better solution.</p> <h3 id="windowed">Windowed <a href="#windowed"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Another approach is to use overlapping ranges, or windows. Like the previous plot, we bin the data and calculate the event rate. However, we can add more bins by allowing them to overlap. If the data set size is small, one strategy is to use a set of wide bins that overlap one another.</p> <p>There are two variables that control the windows. The <strong>step size</strong> controls the frequency of the windows. If we set a step size of 5%, windows will be created for each 5% increment in predicted probability (5%, 10%, 15%, etc). The second argument is the (maximum) <strong>window size</strong>. If it is set to 10%&mdash;and the step size is set at 5%&mdash;then a given step will overlap halfway into both the previous step and the next step. Here is a visual representation of this specific scenario:</p> <div class="highlight"> <p><img src="figs/unnamed-chunk-7-1.png" alt="Plot illustrating the horizontal location of each step and the size of the window" width="70%" style="display: block; margin: auto;" /></p> </div> <p>In probably, the <a href="https://probably.tidymodels.org/reference/cal_plot_breaks.html" target="_blank" rel="noopener"><code>cal_plot_windowed()</code></a> function provides this functionality. The default step size is 0.05, and can be changed via the <code>step_size</code> argument. The default window size is 0.1, and can be changed via the <code>window_size</code> argument:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_windowed</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" alt="Calibration plot with 21 windows, created with the cal_plot_windowed() function" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Here is an example of reducing the <code>step_size</code> from 0.05 to 0.02. There are more than double the windows:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_windowed</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, step_size <span class='o'>=</span> <span class='m'>0.02</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" alt="Calibration plot with more steps than the default, created with the cal_plot_windowed() function" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="model-based">Model-Based <a href="#model-based"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Another way to visualize the performance is to fit a classification model of the events against the estimated probabilities. This is helpful because it avoids the use of pre-determined groupings. Another difference is that we are not plotting midpoints of actual results, but rather predictions based on those results.</p> <p>The <a href="https://probably.tidymodels.org/reference/cal_plot_breaks.html" target="_blank" rel="noopener"><code>cal_plot_logistic()</code></a> provides this functionality. By default, it uses a logistic regression. There are two possible methods for fitting:</p> <ul> <li> <p><code>smooth = TRUE</code> (the default) fits a generalized additive model using splines. This allows for more flexible model fits.</p> </li> <li> <p><code>smooth = FALSE</code> uses an ordinary logistic regression model with linear terms for the predictor.</p> </li> </ul> <p>As an example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_logistic</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" alt="Logistic Spline calibration plot, created with the cal_plot_logistic() function" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The corresponding <a href="https://rdrr.io/r/stats/glm.html" target="_blank" rel="noopener"><code>glm()</code></a> model produces:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_logistic</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, smooth <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" alt="Ordinary logistic calibration plot, created with the cal_plot_logistic() function" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="additional-options-and-features">Additional options and features <a href="#additional-options-and-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3> <h4 id="intervals"><strong>Intervals</strong> <a href="#intervals"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>The confidence intervals are visualized using the gray ribbon. The default interval is 0.9, but can be changed using the <code>conf_level</code> argument.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, conf_level <span class='o'>=</span> <span class='m'>0.8</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-12-1.png" alt="Calibration plot with a confidence interval set to 0.8" width="700px" style="display: block; margin: auto;" /></p> </div> <p>If desired, the intervals can be removed by setting the <code>include_ribbon</code> argument to <code>FALSE</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, include_ribbon <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-13-1.png" alt="Calibration plot with the confidence interval ribbon turned off" width="700px" style="display: block; margin: auto;" /></p> </div> <h4 id="rugs"><strong>Rugs</strong> <a href="#rugs"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>By default, the calibration plots include a RUGs layer at the top and at the bottom of the visualization. They are meant to give us an idea of the density of events and non-events as the probabilities progress from 0 to 1.</p> <div class="highlight"> <p><img src="figs/unnamed-chunk-14-1.png" alt="Calibration plot with arrows pointing to where the RUGS plots are placed in the graph" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This layer can be removed by setting <code>include_rug</code> to <code>FALSE</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>segment_logistic</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span>, include_rug <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> </span> </code></pre> <p><img src="figs/unnamed-chunk-15-1.png" alt="Calibration plot without RUGS" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="integration-with-tune">Integration with tune <a href="#integration-with-tune"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>So far, the inputs to the functions have been data frames. In tidymodels, the tune package has methods for resampling models as well as functions for tuning hyperparameters.</p> <p>The calibration plots in the probably package also support the results of these functions (with class <code>tune_results</code>). The functions read the metadata from the tune object, and the <code>truth</code> and <code>estimate</code> arguments automatically.</p> <p>To showcase this feature, we will tune a model based on simulated data. In order for the calibration plot to work, the predictions need to be collected. This is done by setting <code>save_pred</code> to <code>TRUE</code> in <code>tune_grid()</code>'s control settings.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>111</span><span class='o'>)</span></span> <span><span class='nv'>sim_data</span> <span class='o'>&lt;-</span> <span class='nf'>sim_classification</span><span class='o'>(</span><span class='m'>500</span><span class='o'>)</span></span> <span><span class='nv'>sim_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>sim_data</span>, repeats <span class='o'>=</span> <span class='m'>3</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>rf_mod</span> <span class='o'>&lt;-</span> <span class='nf'>rand_forest</span><span class='o'>(</span>min_n <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>set_mode</span><span class='o'>(</span><span class='s'>"classification"</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>222</span><span class='o'>)</span></span> <span><span class='nv'>tuned_model</span> <span class='o'>&lt;-</span> </span> <span> <span class='nv'>rf_mod</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'>tune_grid</span><span class='o'>(</span></span> <span> <span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>,</span> <span> resamples <span class='o'>=</span> <span class='nv'>sim_folds</span>,</span> <span> grid <span class='o'>=</span> <span class='m'>4</span>,</span> <span> <span class='c'># Important: `saved_pred` has to be set to TRUE in order for </span></span> <span> <span class='c'># the plotting to be possible</span></span> <span> control <span class='o'>=</span> <span class='nf'>control_resamples</span><span class='o'>(</span>save_pred <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nv'>tuned_model</span></span> <span><span class='c'>#&gt; # Tuning results</span></span> <span><span class='c'>#&gt; # 10-fold cross-validation repeated 3 times </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 30 × 6</span></span></span> <span><span class='c'>#&gt; splits id id2 .metrics .notes .predicti…¹</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold01 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold02 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold03 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold04 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold05 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold06 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold07 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold08 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold09 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #555555;'>&lt;split [450/50]&gt;</span> Repeat1 Fold10 <span style='color: #555555;'>&lt;tibble [8 × 5]&gt;</span> <span style='color: #555555;'>&lt;tibble [0 × 3]&gt;</span> &lt;tibble&gt; </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 20 more rows, and abbreviated variable name ¹​.predictions</span></span></span></code></pre> </div> <p>The plotting functions will automatically collect the predictions. Each of the pre-processing groups will be plotted individually in its own facet.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>tuned_model</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_logistic</a></span><span class='o'>(</span><span class='o'>)</span> </span> </code></pre> <p><img src="figs/unnamed-chunk-17-1.png" alt="Multiple calibration plots presented in a grid" width="700px" style="display: block; margin: auto;" /></p> </div> <p>A panel is produced for each value of <code>min_n</code>, coded with an automatically generated configuration name. This makes sure to use the out-of-sample data to make the plot (instead of just re-predicting the training set).</p> <h2 id="preparing-for-the-next-stage">Preparing for the next stage <a href="#preparing-for-the-next-stage"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As mentioned in the outset of this post, the goal is to also provide a way to calibrate the model, and to apply the calibration to future predictions. We have made sure that the plotting functions are ready now to accept multiple probability sets.</p> <p>In this post, we will showcase that functionality by &ldquo;manually&rdquo; creating a quick calibration model and comparing its output to the original probabilities. We will need both of them in the same data frame, as well as a variable distinguishing the original probabilities from the calibrated probabilities. In this case we will create a variable called <code>source</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>model</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/glm.html'>glm</a></span><span class='o'>(</span><span class='nv'>Class</span> <span class='o'>~</span> <span class='nv'>.pred_good</span>, <span class='nv'>segment_logistic</span>, family <span class='o'>=</span> <span class='s'>"binomial"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>preds</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>model</span>, <span class='nv'>segment_logistic</span>, type <span class='o'>=</span> <span class='s'>"response"</span><span class='o'>)</span></span> <span> </span> <span><span class='nv'>combined</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/bind.html'>bind_rows</a></span><span class='o'>(</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>segment_logistic</span>, source <span class='o'>=</span> <span class='s'>"original"</span><span class='o'>)</span>, </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>segment_logistic</span>, .pred_good <span class='o'>=</span> <span class='m'>1</span> <span class='o'>-</span> <span class='nv'>preds</span>, source <span class='o'>=</span> <span class='s'>"glm"</span><span class='o'>)</span></span> <span> <span class='o'>)</span></span> <span></span> <span><span class='nv'>combined</span> </span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2,020 × 4</span></span></span> <span><span class='c'>#&gt; .pred_poor .pred_good Class source </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> 0.986 0.014<span style='text-decoration: underline;'>2</span> poor original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> 0.897 0.103 poor original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> 0.118 0.882 good original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> 0.102 0.898 good original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> 0.991 0.009<span style='text-decoration: underline;'>14</span> poor original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> 0.633 0.367 good original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> 0.770 0.230 good original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> 0.008<span style='text-decoration: underline;'>42</span> 0.992 good original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> 0.995 0.004<span style='text-decoration: underline;'>58</span> poor original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> 0.765 0.235 poor original</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 2,010 more rows</span></span></span></code></pre> </div> <p>The new plot functions support dplyr groupings. So, to overlay the two groups, we just need to pass <code>source</code> to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>combined</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>source</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span></span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-19-1.png" alt="Calibration plot with two overlaying probability trends, one is the original and the second is the model" width="700px" style="display: block; margin: auto;" /></p> </div> <p>If we would like to plot them side by side, we can add <a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html" target="_blank" rel="noopener"><code>facet_wrap()</code></a> as an additional step of the plot:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>combined</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>source</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> </span> <span> <span class='nf'><a href='https://probably.tidymodels.org/reference/cal_plot_breaks.html'>cal_plot_breaks</a></span><span class='o'>(</span><span class='nv'>Class</span>, <span class='nv'>.pred_good</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_wrap.html'>facet_wrap</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>source</span><span class='o'>)</span> <span class='o'>+</span></span> <span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.position <span class='o'>=</span> <span class='s'>"none"</span><span class='o'>)</span></span> </code></pre> <p><img src="figs/unnamed-chunk-20-1.png" alt="Calibration plot with two side-by-side probability trends" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Our goal in the future is to provide calibration functions that create the models, and provide an easy way to visualize them.</p> <h2 id="conclusion">Conclusion <a href="#conclusion"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As mentioned at the top of this post, we welcome your feedback as you try out these features and read about our plans for the future. If you wish to send us your thoughts, feel free to open an issue in probably&rsquo;s GitHub repo here: <a href="https://github.com/tidymodels/probably/issues">https://github.com/tidymodels/probably/issues</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>We can think of an <strong>event</strong> as the outcome that is being tracked by the probability. For example, if a model predicts &ldquo;heads&rdquo; or &ldquo;tails&rdquo; and we want to calibrate the probability for &ldquo;tails&rdquo;, then the <strong>event</strong> is when the column containing the outcome, has the value of &ldquo;tails&rdquo;. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> dplyr 1.1.0 is coming soon https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/ Mon, 28 Nov 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/ <p> <a href="https://dplyr.tidyverse.org/dev/" target="_blank" rel="noopener">dplyr</a> 1.1.0 is coming soon! We haven&rsquo;t started the official release process yet (where we inform maintainers), but that will start in the next few weeks, and then dplyr 1.1.0 is likely to be submitted to CRAN in late January 2023.</p> <p>This is an exciting release for dplyr, incorporating a number of features that have been in flight for years, including:</p> <ul> <li> <p>An inline alternative to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> that implements temporary grouping</p> </li> <li> <p>New join types, such as non-equi joins</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> improvements with character vectors</p> </li> <li> <p> <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>, a generalization of <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a></p> </li> </ul> <p>This pre-release blog post will discuss these new features in more detail. By releasing this post before 1.1.0 is sent to CRAN, we&rsquo;re hoping to get your feedback to catch any potential problems that we&rsquo;ve missed! If you do find a bug, or have general feedback about the new features, we welcome discussion on the <a href="https://github.com/tidyverse/dplyr/issues" target="_blank" rel="noopener">dplyr issues page</a>.</p> <p>You can see a full list of changes in the <a href="https://dplyr.tidyverse.org/dev/news/index.html" target="_blank" rel="noopener">release notes</a>. There are many additional improvements that couldn&rsquo;t fit in a single blog post!</p> <p>dplyr 1.1.0 is not on CRAN yet, but you can install the development version from GitHub with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/pak.html'>pak</a></span><span class='o'>(</span><span class='s'>"tidyverse/dplyr"</span><span class='o'>)</span></span></code></pre> </div> <p>The development version is mostly stable, but is still subject to minor changes before the official release. We don&rsquo;t encourage relying on it for production usage, but we would love for you to try out these new features.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></span> <span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://clock.r-lib.org'>clock</a></span><span class='o'>)</span></span> <span><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>12345</span><span class='o'>)</span></span></code></pre> </div> <h2 id="temporary-grouping-with-by">Temporary grouping with <code>.by</code> <a href="#temporary-grouping-with-by"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Verbs that work &ldquo;by group,&rdquo; such as <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a>, <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/slice.html" target="_blank" rel="noopener"><code>slice()</code></a>, have gained an experimental new argument, <code>.by</code>, which allows for inline and temporary grouping. Grouping radically affects the computation of the dplyr verb you use it with, and one of the goals of <code>.by</code> is to allow you to place that grouping specification alongside the code that actually uses it. As an added benefit, with <code>.by</code> you no longer need to remember to <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>ungroup()</code></a> after <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> won&rsquo;t ever message you about how it&rsquo;s handling the groups!</p> <p>This feature was inspired by <a href="https://cran.r-project.org/package=data.table" target="_blank" rel="noopener">data.table</a>, which has always used per-operation grouping.</p> <p>We&rsquo;ll explore <code>.by</code> with this <code>expenses</code> dataset, containing various <code>cost</code>s tracked across <code>id</code> and <code>region</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>3</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>3</span><span class='o'>)</span>,</span> <span> region <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span>, <span class='s'>"A"</span><span class='o'>)</span>,</span> <span> cost <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>25</span>, <span class='m'>20</span>, <span class='m'>19</span>, <span class='m'>12</span>, <span class='m'>9</span>, <span class='m'>6</span>, <span class='m'>6</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span><span class='nv'>expenses</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 3</span></span></span> <span><span class='c'>#&gt; id region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 25</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 A 20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 A 19</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 B 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 1 B 9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 2 A 6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> 3 A 6</span></span> <span></span></code></pre> </div> <p>If I were to ask you to compute the average <code>cost</code> per <code>region</code>, you&rsquo;d probably write something like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>region</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>cost <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>cost</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B 10.5</span></span> <span></span></code></pre> </div> <p>With <code>.by</code>, you can now write:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>cost <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>cost</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>region</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span></span> <span><span class='c'>#&gt; region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B 10.5</span></span> <span></span></code></pre> </div> <p>These two particular results look the same, but the behavior of <code>.by</code> diverges from <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> when multiple group columns are involved:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>id</span>, <span class='nv'>region</span><span class='o'>)</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>cost <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>cost</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; `summarise()` has grouped output by 'id'. You can override using the `.groups`</span></span> <span><span class='c'>#&gt; argument.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># Groups: id [3]</span></span></span> <span><span class='c'>#&gt; id region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 22</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 B 9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 A 13</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 A 6</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 B 12</span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>cost <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>cost</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>id</span>, <span class='nv'>region</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 3</span></span></span> <span><span class='c'>#&gt; id region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 22</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 A 13</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 B 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 1 B 9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 3 A 6</span></span> <span></span></code></pre> </div> <p>Usage of <code>.by</code> always results in an ungrouped data frame, regardless of the number of group columns involved.</p> <p>You might also recognize that these results aren&rsquo;t returned in exactly the same order. <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> always sorts the grouping keys in ascending order, but <code>.by</code> retains the original ordering found in the data. If you need ordered summaries with <code>.by</code>, we recommend calling <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> explicitly before or after summarizing.</p> <p>While here we&rsquo;ve focused on using <code>.by</code> with <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, it also works with other verbs, like <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> and <a href="https://dplyr.tidyverse.org/reference/slice.html" target="_blank" rel="noopener"><code>slice()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>mean <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='o'>(</span><span class='nv'>cost</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>region</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 4</span></span></span> <span><span class='c'>#&gt; id region cost mean</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 25 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 A 20 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 A 19 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 3 B 12 10.5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 1 B 9 10.5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>6</span> 2 A 6 15.2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>7</span> 3 A 6 15.2</span></span> <span></span><span></span> <span><span class='nv'>expenses</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice</a></span><span class='o'>(</span><span class='m'>2</span>, .by <span class='o'>=</span> <span class='nv'>region</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 3</span></span></span> <span><span class='c'>#&gt; id region cost</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 A 20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 B 9</span></span> <span></span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/group_by.html" target="_blank" rel="noopener"><code>group_by()</code></a> won&rsquo;t ever disappear, but we are having a lot of fun writing new code with <code>.by</code>, and we think you will too.</p> <h2 id="join-improvements">Join improvements <a href="#join-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>All of the join functions in dplyr, such as <a href="https://dplyr.tidyverse.org/reference/mutate-joins.html" target="_blank" rel="noopener"><code>left_join()</code></a>, now accept a flexible join specification created through the new <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> helper. <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> allows you to specify your join conditions as expressions rather than as named character vectors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x_id</span> <span class='o'>==</span> <span class='nv'>y_id</span>, <span class='nv'>region</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Join By:</span></span> <span><span class='c'>#&gt; - x_id == y_id</span></span> <span><span class='c'>#&gt; - region</span></span> <span></span></code></pre> </div> <p>This join specification matches <code>x_id</code> in the left-hand data frame with <code>y_id</code> in the right-hand one, and also matches between a commonly named <code>region</code> column, computing the following equi-join:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>2</span><span class='o'>)</span>, region <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span><span class='o'>)</span>, x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>5</span>, <span class='m'>10</span>, <span class='m'>4</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>df2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>y_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>2</span><span class='o'>)</span>, region <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"A"</span>, <span class='s'>"C"</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>12</span>, <span class='m'>8</span>, <span class='m'>7</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; x_id region x</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 B 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 A 4</span></span> <span></span><span></span> <span><span class='nv'>df2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; y_id region y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 A 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 A 8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 C 7</span></span> <span></span><span></span> <span><span class='nv'>df1</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>df2</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x_id</span> <span class='o'>==</span> <span class='nv'>y_id</span>, <span class='nv'>region</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 4</span></span></span> <span><span class='c'>#&gt; x_id region x y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 5 8</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 B 10 <span style='color: #BB0000;'>NA</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 A 4 12</span></span> <span></span></code></pre> </div> <h3 id="non-equi-joins">Non-equi joins <a href="#non-equi-joins"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Allowing expressions in <a href="https://dplyr.tidyverse.org/reference/join_by.html" target="_blank" rel="noopener"><code>join_by()</code></a> opens up a whole new world of joins in dplyr known as <em>non-equi joins</em>. As the name somewhat implies, these are joins that involve binary conditions other than equality. There are 4 particularly useful types of non-equi joins:</p> <ul> <li> <p><strong>Cross joins</strong> match every pair of rows and were already supported in dplyr.</p> </li> <li> <p><strong>Inequality joins</strong> match using <code>&gt;</code>, <code>&gt;=</code>, <code>&lt;</code>, or <code>&lt;=</code> instead of <code>==</code>.</p> </li> <li> <p><strong>Rolling joins</strong> are based on inequality joins, but only find the closest match.</p> </li> <li> <p><strong>Overlap joins</strong> are also based on inequality joins, but are specialized for working with ranges.</p> </li> </ul> <p>Non-equi joins were requested back in 2016, and were the highest requested dplyr feature at the time they were finally implemented, with over <a href="https://github.com/tidyverse/dplyr/issues/2240" target="_blank" rel="noopener">147 thumbs up</a>! data.table has had support for non-equi joins for many years, and their implementation greatly inspired the one used in dplyr.</p> <p>To demonstrate the different types of non-equi joins, imagine that you are in charge of the party planning committee for your office. Unfortunately, you only get to have one party per quarter, but it is your job to ensure that every employee is assigned to a single party. Upper management has provided the following 4 party dates:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>parties</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> q <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>4</span>,</span> <span> party <span class='o'>=</span> <span class='nf'><a href='https://clock.r-lib.org/reference/date_parse.html'>date_parse</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"2022-01-10"</span>, <span class='s'>"2022-04-04"</span>, <span class='s'>"2022-07-11"</span>, <span class='s'>"2022-10-03"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>parties</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span></span> <span><span class='c'>#&gt; q party </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 4 2022-10-03</span></span> <span></span></code></pre> </div> <p>With this set of employees:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>employees</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> name <span class='o'>=</span> <span class='nf'>wakefield</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/wakefield/man/name.html'>name</a></span><span class='o'>(</span><span class='m'>100</span><span class='o'>)</span>,</span> <span> birthday <span class='o'>=</span> <span class='nf'><a href='https://clock.r-lib.org/reference/date_parse.html'>date_parse</a></span><span class='o'>(</span><span class='s'>"2022-01-01"</span><span class='o'>)</span> <span class='o'>+</span> <span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='m'>365</span>, <span class='m'>100</span>, replace <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='o'>-</span> <span class='m'>1</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>employees</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 100 × 2</span></span></span> <span><span class='c'>#&gt; name birthday </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;variable&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Seager 2022-08-26</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Nathion 2022-10-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Sametra 2022-06-13</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Netty 2022-05-12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Yalissa 2022-05-28</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Mirai 2022-08-20</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Toyoko 2022-08-23</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Earlene 2022-04-21</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Abbegayle 2022-01-27</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Valyssa 2022-03-06</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 90 more rows</span></span></span> <span></span></code></pre> </div> <p>One way to start approaching this problem is to look for the party that happened directly before each birthday. You can do this with an inequality join:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>employees</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>parties</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>birthday</span> <span class='o'>&gt;=</span> <span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 251 × 4</span></span></span> <span><span class='c'>#&gt; name birthday q party </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;variable&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Seager 2022-08-26 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Seager 2022-08-26 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Seager 2022-08-26 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Nathion 2022-10-04 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Nathion 2022-10-04 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Nathion 2022-10-04 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Nathion 2022-10-04 4 2022-10-03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Sametra 2022-06-13 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Sametra 2022-06-13 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Netty 2022-05-12 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 241 more rows</span></span></span> <span></span></code></pre> </div> <p>This looks like a good start, but we&rsquo;ve assigned people with birthdays later in the year to multiple parties. We can restrict this to only the party that is <em>closest</em> to the employee&rsquo;s birthday by using a rolling join. Rolling joins are activated by wrapping an inequality in <code>closest()</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>closest</span> <span class='o'>&lt;-</span> <span class='nv'>employees</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>parties</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nf'>closest</span><span class='o'>(</span><span class='nv'>birthday</span> <span class='o'>&gt;=</span> <span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>closest</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 100 × 4</span></span></span> <span><span class='c'>#&gt; name birthday q party </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;variable&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Seager 2022-08-26 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Nathion 2022-10-04 4 2022-10-03</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Sametra 2022-06-13 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Netty 2022-05-12 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Yalissa 2022-05-28 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Mirai 2022-08-20 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Toyoko 2022-08-23 3 2022-07-11</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Earlene 2022-04-21 2 2022-04-04</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Abbegayle 2022-01-27 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Valyssa 2022-03-06 1 2022-01-10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 90 more rows</span></span></span> <span></span></code></pre> </div> <p>This is close to what we want, but isn&rsquo;t <em>quite</em> right. It turns out that poor Della hasn&rsquo;t been assigned to a party.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>closest</span>, <span class='nf'><a href='https://rdrr.io/r/base/NA.html'>is.na</a></span><span class='o'>(</span><span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 4</span></span></span> <span><span class='c'>#&gt; name birthday q party </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;variable&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> Della 2022-01-06 <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></span> <span></span></code></pre> </div> <p>This is because their birthday occurred before the first party date, <code>2022-01-10</code>, so there wasn&rsquo;t any &ldquo;previous party&rdquo; to match them to. It&rsquo;s a little easier to fix this if we are explicit about the quarter start/end dates that form the ranges to look for matches in:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Some helpers from &#123;clock&#125;</span></span> <span><span class='nv'>quarter_start</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://clock.r-lib.org/reference/as_year_quarter_day.html'>as_year_quarter_day</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://clock.r-lib.org/reference/calendar-boundary.html'>calendar_start</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"quarter"</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://clock.r-lib.org/reference/as_date.html'>as_date</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span><span class='nv'>quarter_end</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span></span> <span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://clock.r-lib.org/reference/as_year_quarter_day.html'>as_year_quarter_day</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://clock.r-lib.org/reference/calendar-boundary.html'>calendar_end</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='s'>"quarter"</span><span class='o'>)</span></span> <span> <span class='nf'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span></span> <span><span class='o'>&#125;</span></span> <span></span> <span><span class='nv'>parties</span> <span class='o'>&lt;-</span> <span class='nv'>parties</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>start <span class='o'>=</span> <span class='nf'>quarter_start</span><span class='o'>(</span><span class='nv'>party</span><span class='o'>)</span>, end <span class='o'>=</span> <span class='nf'>quarter_end</span><span class='o'>(</span><span class='nv'>party</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>parties</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 4</span></span></span> <span><span class='c'>#&gt; q party start end </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 2022-01-10 2022-01-01 2022-03-31</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 2022-04-04 2022-04-01 2022-06-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 3 2022-07-11 2022-07-01 2022-09-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 4 2022-10-03 2022-10-01 2022-12-31</span></span> <span></span></code></pre> </div> <p>Now that we have 4 distinct <em>ranges</em> of dates to work with, we&rsquo;ll use an overlap join to figure out which range each birthday fell <a href="https://dplyr.tidyverse.org/reference/between.html" target="_blank" rel="noopener"><code>between()</code></a>. Since we know that each birthday should be matched to exactly one party, we&rsquo;ll also take this chance to set <code>multiple</code>, a new argument to the join functions that allows you to optionally <code>&quot;error&quot;</code> if a birthday is matched to multiple parties.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>employees</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span></span> <span> <span class='nv'>parties</span>, </span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/between.html'>between</a></span><span class='o'>(</span><span class='nv'>birthday</span>, <span class='nv'>start</span>, <span class='nv'>end</span><span class='o'>)</span><span class='o'>)</span>,</span> <span> multiple <span class='o'>=</span> <span class='s'>"error"</span></span> <span> <span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 100 × 6</span></span></span> <span><span class='c'>#&gt; name birthday q party start end </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;variable&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> Seager 2022-08-26 3 2022-07-11 2022-07-01 2022-09-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> Nathion 2022-10-04 4 2022-10-03 2022-10-01 2022-12-31</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> Sametra 2022-06-13 2 2022-04-04 2022-04-01 2022-06-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> Netty 2022-05-12 2 2022-04-04 2022-04-01 2022-06-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> Yalissa 2022-05-28 2 2022-04-04 2022-04-01 2022-06-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> Mirai 2022-08-20 3 2022-07-11 2022-07-01 2022-09-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> Toyoko 2022-08-23 3 2022-07-11 2022-07-01 2022-09-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> Earlene 2022-04-21 2 2022-04-04 2022-04-01 2022-06-30</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> Abbegayle 2022-01-27 1 2022-01-10 2022-01-01 2022-03-31</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> Valyssa 2022-03-06 1 2022-01-10 2022-01-01 2022-03-31</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 90 more rows</span></span></span> <span></span></code></pre> </div> <p>We consider <code>multiple</code> to be an important &ldquo;quality control&rdquo; argument to help you enforce constraints on the join procedure.</p> <h3 id="multiple-matches">Multiple matches <a href="#multiple-matches"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Speaking of <code>multiple</code>, we&rsquo;ve also given this argument an important default. When doing data analysis with equi-joins, it is often surprising when a join returns more rows than were present in the left-hand side table.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df1</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 3</span></span></span> <span><span class='c'>#&gt; x_id region x</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 5</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 B 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 A 4</span></span> <span></span><span></span> <span><span class='nv'>df2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>y_id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>1</span>, <span class='m'>2</span><span class='o'>)</span>, region <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span>, <span class='s'>"A"</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>9</span>, <span class='m'>10</span>, <span class='m'>12</span>, <span class='m'>4</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>df2</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span></span> <span><span class='c'>#&gt; y_id region y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 2 B 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 A 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 A 4</span></span> <span></span><span></span> <span><span class='nv'>df1</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate-joins.html'>left_join</a></span><span class='o'>(</span><span class='nv'>df2</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/join_by.html'>join_by</a></span><span class='o'>(</span><span class='nv'>x_id</span> <span class='o'>==</span> <span class='nv'>y_id</span>, <span class='nv'>region</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning in left_join(df1, df2, join_by(x_id == y_id, region)): Each row in `x` is expected to match at most 1 row in `y`.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Row 1 of `x` matches multiple rows.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> If multiple matches are expected, set `multiple = "all"` to silence this</span></span> <span><span class='c'>#&gt; warning.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 4</span></span></span> <span><span class='c'>#&gt; x_id region x y</span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 A 5 9</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 A 5 12</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 B 10 10</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 A 4 4</span></span> <span></span></code></pre> </div> <p>In this case, row 1 of <code>df1</code> matched both rows <code>1</code> and <code>3</code> of <code>df2</code>, so the output has 4 rows rather than <code>df1</code>'s 3. While this is standard SQL behavior, community feedback has shown that many people don&rsquo;t expect this, and a number of people were horrified to learn that this was even possible! Because of this, we&rsquo;ve made this case a warning by default, which you can silence with <code>multiple = &quot;all&quot;</code>.</p> <h2 id="arrange-improvements-with-character-vectors"><code>arrange()</code> improvements with character vectors <a href="#arrange-improvements-with-character-vectors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> now uses a new custom backend for generating the ordering. This generally improves performance, but it is especially apparent with character vectors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># 10,000 random strings, sampled up to 1,000,000 rows</span></span> <span><span class='nv'>dictionary</span> <span class='o'>&lt;-</span> <span class='nf'>stringi</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/stringi/man/stri_rand_strings.html'>stri_rand_strings</a></span><span class='o'>(</span><span class='m'>10000</span>, length <span class='o'>=</span> <span class='m'>10</span>, pattern <span class='o'>=</span> <span class='s'>"[a-z]"</span><span class='o'>)</span></span> <span><span class='nv'>str</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sample.html'>sample</a></span><span class='o'>(</span><span class='nv'>dictionary</span>, size <span class='o'>=</span> <span class='m'>1e6</span>, replace <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>str</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1,000,000 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 1</span> btjgpowbav</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 2</span> jrddujrxwt</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 3</span> ofgkybvsoo</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 4</span> dzyxfvwktu</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 5</span> qobgfmkgof</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 6</span> rmzjvtnpbf</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 7</span> jxrqgxouqg</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 8</span> empcmhnlqq</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'> 9</span> nwfgauiurp</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>10</span> hdswclaxys</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># … with 999,990 more rows</span></span></span> <span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># dplyr 1.0.10</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span><span class='o'>)</span>, iterations <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec`</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x) 4.38s 4.89s 0.204 12.7MB 0.148</span></span> <span></span> <span><span class='c'># dplyr 1.1.0</span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span><span class='o'>)</span>, iterations <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc `gc/sec`</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:tm&gt; &lt;dbl&gt; &lt;bch:byt&gt; &lt;dbl&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x) 42.3ms 46.6ms 20.8 22.4MB 46.0</span></span></code></pre> </div> <p>For those keeping score, that is a 100x improvement! Now, I&rsquo;ll be honest, I&rsquo;m being a bit tricky here. The new backend for <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> comes with a meaningful change in behavior - it now sorts character strings in the C locale by default, rather than in the much slower system locale (American English, for me). We made this change for two main reasons:</p> <ul> <li> <p>Much faster performance by default, because it can use {vctrs} radix sort (inspired by data.table)</p> </li> <li> <p>Improved reproducibility across R sessions, where different computers might use different system locales</p> </li> </ul> <p>For English users, we expect this change to have fairly minimal impact. The largest difference in ordering between the C and American English locales has to do with capitalization. In the C locale, uppercase letters are always placed before <em>any</em> lowercase letters. In the American English locale, uppercase letters are placed directly after their lowercase equivalent.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span>, <span class='s'>"b"</span><span class='o'>)</span><span class='o'>)</span></span> <span></span> <span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>x</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> B </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> b</span></span> <span></span></code></pre> </div> <p>If you do need to order with a specific locale, you can specify the new <code>.locale</code> argument, which takes a locale identifier string, just like <a href="https://stringr.tidyverse.org/reference/str_order.html" target="_blank" rel="noopener"><code>stringr::str_sort()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>x</span>, .locale <span class='o'>=</span> <span class='s'>"en"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> A </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> B</span></span> <span></span></code></pre> </div> <p>To use this optional <code>.locale</code> feature, you must have the stringi package installed, but you likely already do because it is installed with the tidyverse by default.</p> <p>It is also worth noting that using <code>.locale</code> is still much faster than relying on the system locale.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># Compare with ~5 seconds above with dplyr 1.0.10</span></span> <span></span> <span><span class='nf'>bench</span><span class='nf'>::</span><span class='nf'><a href='http://bench.r-lib.org/reference/mark.html'>mark</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>str</span>, <span class='nv'>x</span>, .locale <span class='o'>=</span> <span class='s'>"en"</span><span class='o'>)</span>, iterations <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span></span> <span><span class='c'>#&gt; # A tibble: 1 × 6</span></span> <span><span class='c'>#&gt; expression min median `itr/sec` mem_alloc</span></span> <span><span class='c'>#&gt; &lt;bch:expr&gt; &lt;bch:tm&gt; &lt;bch:&gt; &lt;dbl&gt; &lt;bch:byt&gt;</span></span> <span><span class='c'>#&gt; 1 arrange(str, x, .locale = "en") 377ms 430ms 2.21 27.9MB</span></span> <span><span class='c'>#&gt; # … with 1 more variable: `gc/sec` &lt;dbl&gt;</span></span></code></pre> </div> <p>For non-English Latin script languages, such as Spanish, you may see more of a change, as characters such as <code>ñ</code> are ordered after <code>z</code> rather than before <code>n</code> in the C locale:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"\u00F1"</span>, <span class='s'>"n"</span>, <span class='s'>"z"</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>df</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> ñ </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> n </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> z</span></span> <span></span><span></span> <span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>x</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> n </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> z </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> ñ</span></span> <span></span><span></span> <span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/arrange.html'>arrange</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nv'>x</span>, .locale <span class='o'>=</span> <span class='s'>"es"</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span></span> <span><span class='c'>#&gt; x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> n </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> ñ </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> z</span></span> <span></span></code></pre> </div> <p>We are optimistic that this change is an overall net positive. We anticipate that many users use <a href="https://dplyr.tidyverse.org/reference/arrange.html" target="_blank" rel="noopener"><code>arrange()</code></a> to simply group similar looking observations together, and we expect that the main places you&rsquo;ll need to care about localized ordering are the few places when you are generating human readable output, such as a table or a chart, at which point you might consider using <code>.locale</code>.</p> <p>If you are having trouble converting an existing script over to the new behavior, you can set the temporary global option <code>options(dplyr.legacy_locale = TRUE)</code>, which will revert to the pre-1.1.0 behavior of using the system locale. We expect to remove this option in a future release.</p> <p>To learn more low-level details about this change, you can read our <a href="https://github.com/tidyverse/tidyups/blob/main/003-dplyr-radix-ordering.md" target="_blank" rel="noopener">tidyup</a>.</p> <h2 id="reframe-a-generalization-of-summarise"><code>reframe()</code>, a generalization of <code>summarise()</code> <a href="#reframe-a-generalization-of-summarise"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In dplyr 1.0.0, we introduced a powerful new feature: <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> could return per-group results of any length, rather than just length 1. For example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>table</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"a"</span>, <span class='s'>"b"</span>, <span class='s'>"d"</span>, <span class='s'>"f"</span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span></span> <span> g <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='m'>2</span>, <span class='m'>2</span>, <span class='m'>2</span><span class='o'>)</span>,</span> <span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"e"</span>, <span class='s'>"a"</span>, <span class='s'>"b"</span>, <span class='s'>"c"</span>, <span class='s'>"f"</span>, <span class='s'>"d"</span>, <span class='s'>"a"</span><span class='o'>)</span></span> <span><span class='o'>)</span></span> <span></span> <span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>intersect</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>table</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span></span> <span><span class='c'>#&gt; g x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 f </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 d </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 a</span></span> <span></span></code></pre> </div> <p>While extremely powerful, community feedback has raised the valid concern that allowing <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> to return any number of rows per group:</p> <ul> <li> <p>Increases the chance for accidental bugs</p> </li> <li> <p>Is against the spirit of a &ldquo;summary,&rdquo; which implies 1 row per group</p> </li> <li> <p>Makes translation to dbplyr very difficult</p> </li> </ul> <p>We agree! In response to this, we&rsquo;ve decided to walk back that change to <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, which will now throw a warning when either 0 or &gt;1 rows are returned per group:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>intersect</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>table</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in</span></span> <span><span class='c'>#&gt; dplyr 1.1.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `reframe()` instead.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> When switching from `summarise()` to `reframe()`, remember that `reframe()`</span></span> <span><span class='c'>#&gt; always returns an ungrouped data frame and adjust accordingly.</span></span> <span></span><span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span></span> <span><span class='c'>#&gt; g x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 f </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 d </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 a</span></span> <span></span></code></pre> </div> <p>That said, we still believe that this is a powerful tool, so we&rsquo;ve moved these features to a new verb, <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>. Think of <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> as a generic tool for &ldquo;doing something to each group,&rdquo; with no restrictions on the number of rows returned per group.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>df</span> <span class='o'>|&gt;</span></span> <span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/reframe.html'>reframe</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://generics.r-lib.org/reference/setops.html'>intersect</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>table</span><span class='o'>)</span>, .by <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span></span> <span><span class='c'>#&gt; g x </span></span> <span><span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 a </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 b </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 f </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 d </span></span> <span><span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 a</span></span> <span></span></code></pre> </div> <p>One big difference between <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> and <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> is that <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> always returns an ungrouped data frame, even if the input was a grouped data frame with multiple group columns. This simplifies <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> immensely, as it doesn&rsquo;t need to inherit the <code>.groups</code> argument of <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a>, and never emits any messages.</p> <p>We expect that you&rsquo;ll continue to use <a href="https://dplyr.tidyverse.org/reference/summarise.html" target="_blank" rel="noopener"><code>summarise()</code></a> much more often than <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a>, but if you ever find yourself applying a function to each group that returns an arbitrary number of rows, <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> should be your go-to tool!</p> <p> <a href="https://dplyr.tidyverse.org/reference/reframe.html" target="_blank" rel="noopener"><code>reframe()</code></a> is one of the places we could use your feedback! We aren&rsquo;t completely confident about this function name yet, so if you have any feedback about it or suggestions for an alternate one, please leave a comment on this <a href="https://github.com/tidyverse/dplyr/issues/6565" target="_blank" rel="noopener">issue</a>.</p> ggplot2 3.4.0 https://www.tidyverse.org/blog/2022/11/ggplot2-3-4-0/ Mon, 07 Nov 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/11/ggplot2-3-4-0/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re so happy to announce the release of <a href="https://ggplot2.tidyverse.org" target="_blank" rel="noopener">ggplot2</a> 3.4.0 on CRAN. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. The new version can be installed from CRAN using <code>install.packages(&quot;ggplot2&quot;)</code>.</p> <p>This release is not full of exciting new features. Instead we have focused on the internals, tightening up of the API, and improving the messaging, especially when it comes to errors and warnings. While the release also contains a few new features these other aspects are the stars of this release.</p> <p>You can see a full list of changes in the <a href="https://ggplot2.tidyverse.org/news/index.html" target="_blank" rel="noopener">release notes</a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://patchwork.data-imaginist.com'>patchwork</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></code></pre> </div> <h2 id="hello-linewidth">Hello <code>linewidth</code> <a href="#hello-linewidth"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Arguably the biggest user.visible change in this release is the introduction of a new fundamental aesthetic. From this release on, <code>linewidth</code> will take over sizing of the width of lines&mdash;something that was earlier handled by <code>size</code>. The reason for this change is that prior to this release <code>size</code> was used for two related, but different, properties: the size of points (and glyphs) and the width of lines. Since one is area based and one is length based they fundamentally needs different scaling and the default size scale has always catered to area sizing, using a square root transform. This conflation has also made it hard for composite geoms like <a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html" target="_blank" rel="noopener"><code>geom_pointrange()</code></a> to control the line width and point size separately.</p> <p>There is not much to discuss when it comes to how to use this &ldquo;feature&rdquo;, as it is a matter of switching out <code>size</code> with <code>linewidth</code> whenever you target stroke sizing:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>Day</span>, <span class='nv'>Temp</span>, linewidth <span class='o'>=</span> <span class='nv'>Month</span>, group <span class='o'>=</span> <span class='nv'>Month</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_linewidth.html'>scale_linewidth</a></span><span class='o'>(</span>range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-1-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Now, changing such a fundamental thing when a package is as old and widely used as ggplot2 is no small undertaking, and I wish it had been done earlier, but better late than never. We have gone to great lengths to ensure that old code continues to work. For the most part using size will continue to behave like before:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>Day</span>, <span class='nv'>Temp</span>, size <span class='o'>=</span> <span class='nv'>Month</span>, group <span class='o'>=</span> <span class='nv'>Month</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_size.html'>scale_size</a></span><span class='o'>(</span>range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `linewidth` instead.</span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As you can see you get the expected plot but also gets a deprecation warning asking you to update your code. Comparing the two legends we can also see the discrepancy in scaling that we discussed above, showing a much more even progression with <code>linewidth</code>.</p> <p>All of this should work with all the geoms provided by ggplot2 (and we have described <a href="https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/" target="_blank" rel="noopener">a clear upgrade path for extension developers to adopt this</a>), except for a few instances where <code>size</code> remains a valid aesthetic for the geom. In these cases you will not get a deprecation warning and your output may change in look when running old code. The two geoms this concerns are <a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html" target="_blank" rel="noopener"><code>geom_pointrange()</code></a> and <a href="https://ggplot2.tidyverse.org/reference/ggsf.html" target="_blank" rel="noopener"><code>geom_sf()</code></a> which both continues to use <code>size</code> to scale points. Comparing the output from e.g.  <a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html" target="_blank" rel="noopener"><code>geom_pointrange()</code></a> we can see how using <code>size</code> now only targets the point and not the line:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_linerange.html'>geom_pointrange</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>Month</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nv'>Temp</span><span class='o'>)</span>, stat <span class='o'>=</span> <span class='s'>"summary"</span>, size <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span> <span class='c'>#&gt; No summary function supplied, defaulting to `mean_se()`</span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>We recognize that introducing silent visual changes like this is not optimal but we weighted both sides and decided that it was better in the long run to rip the band-aid off and commit fully to the <code>linewidth</code> change in one release.</p> <p>The switch to <code>linewidth</code> goes beyond aesthetics and should target every part of the API that have used <code>size</code> to target line width. This is mostly present in theming where <a href="https://ggplot2.tidyverse.org/reference/element.html" target="_blank" rel="noopener"><code>element_rect()</code></a> and <a href="https://ggplot2.tidyverse.org/reference/element.html" target="_blank" rel="noopener"><code>element_line()</code></a> now uses <code>linewidth</code> as argument instead of <code>size</code>. As above a deprecation warning will inform you of this change:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>mpg</span>, y <span class='o'>=</span> <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>panel.grid <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span>linewidth <span class='o'>=</span> <span class='m'>0.2</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>We have done our best to ensure that it is easy for our extension developers to follow the path laid out by ggplot2 when it comes to embracing the new aesthetic, but you will probably experience a period of discrepancy between some of your favorite extensions and ggplot2. I have full confidence that our amazing extension developers will adapt quickly so that period will probably be short.</p> <h3 id="on-the-topic-of-line-width">On the topic of line width <a href="#on-the-topic-of-line-width"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We have made a few other internal changes when it comes to line widths. The biggest of these are perhaps a new default for polygon line width in <a href="https://ggplot2.tidyverse.org/reference/ggsf.html" target="_blank" rel="noopener"><code>geom_sf()</code></a>. The change came about as we already had induced visual changes to old code due to the <code>linewidth</code> aesthetic introduction and based on feedback from the spatial community we saw that <code>size</code> was most often used to thin the polygon borders. The new default is 0.2 (down from 0.5) and hopefully strikes a nice balance:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>nc</span> <span class='o'>&lt;-</span> <span class='nf'>sf</span><span class='nf'>::</span><span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_read.html'>st_read</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/system.file.html'>system.file</a></span><span class='o'>(</span><span class='s'>"shape/nc.shp"</span>, package <span class='o'>=</span> <span class='s'>"sf"</span><span class='o'>)</span>, quiet <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='nv'>p1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>nc</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggsf.html'>geom_sf</a></span><span class='o'>(</span>linewidth <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"Old default"</span><span class='o'>)</span> <span class='nv'>p2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>nc</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggsf.html'>geom_sf</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"New default"</span><span class='o'>)</span> <span class='nv'>p1</span><span class='o'>/</span><span class='nv'>p2</span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>More minor is a small fix we did to <a href="https://ggplot2.tidyverse.org/reference/guide_colourbar.html" target="_blank" rel="noopener"><code>guide_colorbar()</code></a> where it was brought to our attention that the <code>ticks.linewidth</code> and <code>frame.linewidth</code> weren&rsquo;t given in the same unit as every other line width in ggplot2. This has been corrected and the default has been adjusted to retain the same look but if you have given these specifically in your code you are likely to notice a visual change.</p> <h2 id="other-breaking-changes">Other breaking changes <a href="#other-breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In the grab-bag of breaking changes we have now formally deprecated <a href="https://ggplot2.tidyverse.org/reference/qplot.html" target="_blank" rel="noopener"><code>qplot()</code></a>. It will continue to work as always but will be a bit noisy about it. Don&rsquo;t expect the function to disappear, but the deprecation signals that we don&rsquo;t intend to do further work on <a href="https://ggplot2.tidyverse.org/reference/qplot.html" target="_blank" rel="noopener"><code>qplot()</code></a> to keep it current with new features etc. In the same vein, <a href="https://ggplot2.tidyverse.org/reference/aes_eval.html" target="_blank" rel="noopener"><code>stat()</code></a> and <code>..var..</code> for marking aesthetics from stats are also formally deprecated in favor of <a href="https://ggplot2.tidyverse.org/reference/aes_eval.html" target="_blank" rel="noopener"><code>after_stat()</code></a>. Again, the result is that using these old APIs will be noisy but still work.</p> <p>On the topic of <a href="https://ggplot2.tidyverse.org/reference/aes_eval.html" target="_blank" rel="noopener"><code>after_stat()</code></a>, the values and computations inside of it now use the un-transformed variables rather than the transformed ones. This is a bit esoteric and only applies to aesthetics that have had a scale transformation applied to them, so you may never notice.</p> <p>Lastly, we have made a switch to using <a href="https://rlang.r-lib.org/reference/hash.html" target="_blank" rel="noopener"><code>rlang::hash()</code></a> instead of <a href="https://rdrr.io/pkg/digest/man/digest.html" target="_blank" rel="noopener"><code>digest::digest()</code></a> which may result in the automatic ordering of legends changing, again a pretty minor change. If you care about the ordering of the legends you can always take control of it using the <code>order</code> argument inside the different <code>guide_*()</code> constructors.</p> <h2 id="better-errors">Better errors <a href="#better-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One of the most substantial changes in usability in this release is a complete rewrite of the errors and warnings. This goes deeper than changing wordings as the messaging is now based on the signal handling in the <a href="https://cli.r-lib.org" target="_blank" rel="noopener">cli</a> package that provides rich text formatting and better ways to guide the user to a resolution. Consider the following easy to make mistake of using the pipe instead of <code>+</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>mpg</span>, <span class='nv'>disp</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `geom_point()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `mapping` must be created by `aes()`</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Did you use `%&gt;%` or `|&gt;` instead of `+`?</span></code></pre> </div> <p>As can be seen, the error now clearly states where it is happening, then tells you what is wrong, and lastly gives you a hint at what might be the solution.</p> <p>However, this is not all. One of the biggest issues with error reporting in ggplot2 is that most code is evaluated during rendering, not when the API calls are made. Because of this it has been difficult to link a user error in a geom specification to the actual error message that arises. This could send the user on a treasure hunt to identify what to change in order to fix the code. With the changes in 3.4.0 we are now much better at directing the user to the right place in their code when errors in the rendering happens:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>huron</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>year <span class='o'>=</span> <span class='m'>1875</span><span class='o'>:</span><span class='m'>1972</span>, level <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/vector.html'>as.vector</a></span><span class='o'>(</span><span class='nv'>LakeHuron</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>huron</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>year</span>, <span class='nv'>level</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_ribbon.html'>geom_ribbon</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>year</span>, xmin <span class='o'>=</span> <span class='nv'>level</span> <span class='o'>-</span> <span class='m'>5</span>, xmax <span class='o'>=</span> <span class='nv'>level</span> <span class='o'>+</span> <span class='m'>5</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `geom_ribbon()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while setting up geom.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Error occurred in the 2nd layer.</span> <span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `compute_geom_1()` at </span><a href='file:///Users/thomas/Dropbox/GitHub/ggplot2/R/ggproto.r'><span style='font-weight: bold;'>ggplot2/R/ggproto.r:182:16</span></a><span style='font-weight: bold;'>:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `geom_ribbon()` requires the following missing aesthetics: <span style='color: #00BB00;'>ymin</span> and</span> <span class='c'>#&gt; <span style='color: #00BB00;'>ymax</span> <span style='font-weight: bold;'>or</span> <span style='color: #00BB00;'>y</span></span></code></pre> </div> <p>We can see that the error message correctly identifies the geom responsible for the layer, communicates during what part of the rendering it happened during, and points to the index of the layer in the case that multiple layers from the same geom have been used. Lastly it shows the original error that can help you with solving the issue.</p> <p>Hopefully the changes goes a long way to make ggplot2 even more welcoming to new and seasoned users alike. However, this effort is never done and we continue to appreciate issues in the github repository pointing out unhelpful errors or warnings that arises so that we may improve it further.</p> <h2 id="vctrs-inside">vctrs inside <a href="#vctrs-inside"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The last part of the large housekeeping changes in this release is that ggplot2 finally embraces <a href="https://vctrs.r-lib.org" target="_blank" rel="noopener">vctrs</a> and uses it&rsquo;s functions internally primarily for binding data together. Apart from a nice bump in rendering speed it also means that we now better support data types built upon vctrs and subscribe to the more well-defined coercion rules that it provides. The last point is a double edged sword though, as your code may contain a diverse mix of data types in different layers that worked before but doesn&rsquo;t align with the strictness of vctrs. While we have gone to lengths to ensure that your code still works you will begin to see deprecation notices if you e.g. factor on a variable that is incompatible across layers:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>labels</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> label <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste</a></span><span class='o'>(</span><span class='s'>"gear"</span>, <span class='m'>3</span><span class='o'>:</span><span class='m'>5</span><span class='o'>)</span>, gear <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/character.html'>as.character</a></span><span class='o'>(</span><span class='m'>3</span><span class='o'>:</span><span class='m'>5</span><span class='o'>)</span>, x <span class='o'>=</span> <span class='m'>100</span>, y <span class='o'>=</span> <span class='m'>11</span> <span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>disp</span>, <span class='nv'>mpg</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_text.html'>geom_text</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, label <span class='o'>=</span> <span class='nv'>label</span><span class='o'>)</span>, <span class='nv'>labels</span>, hjust <span class='o'>=</span> <span class='s'>"left"</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_wrap.html'>facet_wrap</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>gear</span><span class='o'>)</span> <span class='c'>#&gt; Warning: Combining variables of class &lt;numeric&gt; and &lt;character&gt; was deprecated in</span> <span class='c'>#&gt; ggplot2 3.4.0.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please ensure your variables are compatible before plotting (location:</span> <span class='c'>#&gt; `combine_vars()`)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>While this may seem like an unnecessary annoyance we hope that you&rsquo;ll learn to appreciate that this strictness can save you from silent bugs where you end up combining variables that are basically incompatible.</p> <h2 id="new-features">New features <a href="#new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While most of the focus has been on internal housekeeping in this release a few new features has also crept in, courtesy of our amazing contributors from the community:</p> <h3 id="stacking-non-aligned-data">Stacking non-aligned data <a href="#stacking-non-aligned-data"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://ggplot2.tidyverse.org/reference/position_stack.html" target="_blank" rel="noopener"><code>position_stack()</code></a> has always required that groups share a common x-value to be stacked. The nature of most time series data etc. makes it so that this is often the case, but not always. We have now introduced a <a href="https://ggplot2.tidyverse.org/reference/geom_ribbon.html" target="_blank" rel="noopener"><code>stat_align()</code></a> that takes care of interpolating y-values in each group at every unique x-value in the data so that they can be stacked. This stat is now the default for <a href="https://ggplot2.tidyverse.org/reference/geom_ribbon.html" target="_blank" rel="noopener"><code>geom_area()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'>tibble</span><span class='nf'>::</span><span class='nf'><a href='https://tibble.tidyverse.org/reference/tribble.html'>tribble</a></span><span class='o'>(</span> <span class='o'>~</span><span class='nv'>g</span>, <span class='o'>~</span><span class='nv'>x</span>, <span class='o'>~</span><span class='nv'>y</span>, <span class='s'>"a"</span>, <span class='m'>1</span>, <span class='m'>2</span>, <span class='s'>"a"</span>, <span class='m'>3</span>, <span class='m'>5</span>, <span class='s'>"a"</span>, <span class='m'>5</span>, <span class='m'>1</span>, <span class='s'>"b"</span>, <span class='m'>2</span>, <span class='m'>0</span>, <span class='s'>"b"</span>, <span class='m'>4</span>, <span class='m'>6</span>, <span class='s'>"b"</span>, <span class='m'>6</span>, <span class='m'>7</span> <span class='o'>)</span> <span class='nv'>p1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, fill <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_ribbon.html'>geom_area</a></span><span class='o'>(</span>stat <span class='o'>=</span> <span class='s'>"identity"</span>, alpha <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"stat_identity()"</span><span class='o'>)</span> <span class='nv'>p2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, fill <span class='o'>=</span> <span class='nv'>g</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_ribbon.html'>geom_area</a></span><span class='o'>(</span>alpha <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>ggtitle</a></span><span class='o'>(</span><span class='s'>"stat_align()"</span><span class='o'>)</span> <span class='o'>(</span><span class='nv'>p1</span> <span class='o'>|</span> <span class='nv'>p2</span><span class='o'>)</span> <span class='o'>&amp;</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>legend.position <span class='o'>=</span> <span class='s'>"none"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="bounded-density-estimation">Bounded density estimation <a href="#bounded-density-estimation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://ggplot2.tidyverse.org/reference/geom_density.html" target="_blank" rel="noopener"><code>geom_density()</code></a> have gained a <code>bounds</code> argument allowing you to perform density estimation with bound correction. This can leads to might better edge estimates when bounds are known for a sample:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Exponential.html'>rexp</a></span><span class='o'>(</span><span class='m'>100</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>data</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_density.html'>geom_density</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"unbounded"</span><span class='o'>)</span>, key_glyph <span class='o'>=</span> <span class='s'>"path"</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_density.html'>geom_density</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"bounded"</span><span class='o'>)</span>, bounds <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='kc'>Inf</span><span class='o'>)</span>, key_glyph <span class='o'>=</span> <span class='s'>"path"</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_function.html'>stat_function</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>colour <span class='o'>=</span> <span class='s'>"true distribution"</span><span class='o'>)</span>, fun <span class='o'>=</span> <span class='nv'>dexp</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_manual.html'>scale_colour_manual</a></span><span class='o'>(</span> name <span class='o'>=</span> <span class='kc'>NULL</span>, values <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"black"</span>, <span class='s'>"firebrick"</span>, <span class='s'>"forestgreen"</span><span class='o'>)</span>, breaks <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"true distribution"</span>, <span class='s'>"unbounded"</span>, <span class='s'>"bounded"</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="no-clipping-in-facet-strips">No clipping in facet strips <a href="#no-clipping-in-facet-strips"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>It is now possible to turn clipping in the facet strips off. For the most part the default works fine but in certain situations you&rsquo;d like the strip text or the border to be seen in full. The new feature is a theme setting:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>p</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>diamonds</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>color</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_wrap.html'>facet_wrap</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>cut</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggtheme.html'>theme_minimal</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span> strip.background <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_rect</a></span><span class='o'>(</span><span class='s'>"grey90"</span>, colour <span class='o'>=</span> <span class='s'>"grey90"</span>, linewidth <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span>, axis.line.y <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/element.html'>element_line</a></span><span class='o'>(</span>linewidth <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>p</span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>In the (a bit contrived) theme above we see a jarring step between the strip background and the axis line because the border of the strip is clipped to the extent of the strip. We can fix this by turning off clipping:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>p</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/theme.html'>theme</a></span><span class='o'>(</span>strip.clip <span class='o'>=</span> <span class='s'>"off"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-12-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h3 id="justification-in-geom_bargeom_col">Justification in <code>geom_bar()</code>/<code>geom_col()</code> <a href="#justification-in-geom_bargeom_col"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>You can now specify how the bars in <a href="https://ggplot2.tidyverse.org/reference/geom_bar.html" target="_blank" rel="noopener"><code>geom_bar()</code></a> should be justified with respect to the position on the axis they are tied to:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>mtcars_centered</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, justification <span class='o'>=</span> <span class='s'>"centered"</span><span class='o'>)</span> <span class='nv'>mtcars_left</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span><span class='nv'>mtcars</span>, justification <span class='o'>=</span> <span class='s'>"left aligned"</span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span>mapping <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>gear</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>mtcars_centered</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_bar.html'>geom_bar</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>mtcars_left</span>, just <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_wrap.html'>facet_wrap</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>justification</span>, ncol <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-13-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>It goes without saying that you should only do this for good reasons because it goes against how people in general expect bar plots to behave, but for certain layout needs it can be a boon.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As always, this release could not be possible without contributions from our amazing community. A huge thanks goes out to everyone who has helped made ggplot2 3.4.0 a reality:</p> <p> <a href="https://github.com/92amartins" target="_blank" rel="noopener">@92amartins</a>, <a href="https://github.com/acircleda" target="_blank" rel="noopener">@acircleda</a>, <a href="https://github.com/AlgaeKat" target="_blank" rel="noopener">@AlgaeKat</a>, <a href="https://github.com/andreaskuepfer" target="_blank" rel="noopener">@andreaskuepfer</a>, <a href="https://github.com/angleik" target="_blank" rel="noopener">@angleik</a>, <a href="https://github.com/aphalo" target="_blank" rel="noopener">@aphalo</a>, <a href="https://github.com/artuurC" target="_blank" rel="noopener">@artuurC</a>, <a href="https://github.com/asolisc" target="_blank" rel="noopener">@asolisc</a>, <a href="https://github.com/baderstine" target="_blank" rel="noopener">@baderstine</a>, <a href="https://github.com/basille" target="_blank" rel="noopener">@basille</a>, <a href="https://github.com/bergsmat" target="_blank" rel="noopener">@bergsmat</a>, <a href="https://github.com/bersbersbers" target="_blank" rel="noopener">@bersbersbers</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/brianmsm" target="_blank" rel="noopener">@brianmsm</a>, <a href="https://github.com/brunomioto" target="_blank" rel="noopener">@brunomioto</a>, <a href="https://github.com/bwiernik" target="_blank" rel="noopener">@bwiernik</a>, <a href="https://github.com/capnrefsmmat" target="_blank" rel="noopener">@capnrefsmmat</a>, <a href="https://github.com/clauswilke" target="_blank" rel="noopener">@clauswilke</a>, <a href="https://github.com/cmartin" target="_blank" rel="noopener">@cmartin</a>, <a href="https://github.com/ConchuirohAodha" target="_blank" rel="noopener">@ConchuirohAodha</a>, <a href="https://github.com/corybrunson" target="_blank" rel="noopener">@corybrunson</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/DarioS" target="_blank" rel="noopener">@DarioS</a>, <a href="https://github.com/Darxor" target="_blank" rel="noopener">@Darxor</a>, <a href="https://github.com/davidchall" target="_blank" rel="noopener">@davidchall</a>, <a href="https://github.com/davidhodge931" target="_blank" rel="noopener">@davidhodge931</a>, <a href="https://github.com/dhrhzz" target="_blank" rel="noopener">@dhrhzz</a>, <a href="https://github.com/DiegoJArg" target="_blank" rel="noopener">@DiegoJArg</a>, <a href="https://github.com/DISOhda" target="_blank" rel="noopener">@DISOhda</a>, <a href="https://github.com/drtoche" target="_blank" rel="noopener">@drtoche</a>, <a href="https://github.com/Enterprise-J" target="_blank" rel="noopener">@Enterprise-J</a>, <a href="https://github.com/ewallace" target="_blank" rel="noopener">@ewallace</a>, <a href="https://github.com/gbrlrgrs" target="_blank" rel="noopener">@gbrlrgrs</a>, <a href="https://github.com/ggrothendieck" target="_blank" rel="noopener">@ggrothendieck</a>, <a href="https://github.com/GregorDall" target="_blank" rel="noopener">@GregorDall</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/henningpohl" target="_blank" rel="noopener">@henningpohl</a>, <a href="https://github.com/Hugh-Mungo" target="_blank" rel="noopener">@Hugh-Mungo</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/JacobElder" target="_blank" rel="noopener">@JacobElder</a>, <a href="https://github.com/jarauh" target="_blank" rel="noopener">@jarauh</a>, <a href="https://github.com/javlon" target="_blank" rel="noopener">@javlon</a>, <a href="https://github.com/jdonland" target="_blank" rel="noopener">@jdonland</a>, <a href="https://github.com/jessexknight" target="_blank" rel="noopener">@jessexknight</a>, <a href="https://github.com/jfunction" target="_blank" rel="noopener">@jfunction</a>, <a href="https://github.com/JobNmadu" target="_blank" rel="noopener">@JobNmadu</a>, <a href="https://github.com/JoFAM" target="_blank" rel="noopener">@JoFAM</a>, <a href="https://github.com/jooyoungseo" target="_blank" rel="noopener">@jooyoungseo</a>, <a href="https://github.com/jpquast" target="_blank" rel="noopener">@jpquast</a>, <a href="https://github.com/jtlandis" target="_blank" rel="noopener">@jtlandis</a>, <a href="https://github.com/junjunlab" target="_blank" rel="noopener">@junjunlab</a>, <a href="https://github.com/jwhendy" target="_blank" rel="noopener">@jwhendy</a>, <a href="https://github.com/kapsner" target="_blank" rel="noopener">@kapsner</a>, <a href="https://github.com/KasperThystrup" target="_blank" rel="noopener">@KasperThystrup</a>, <a href="https://github.com/kongdd" target="_blank" rel="noopener">@kongdd</a>, <a href="https://github.com/LarryVincent" target="_blank" rel="noopener">@LarryVincent</a>, <a href="https://github.com/leonjessen" target="_blank" rel="noopener">@leonjessen</a>, <a href="https://github.com/Lisamrshhsr" target="_blank" rel="noopener">@Lisamrshhsr</a>, <a href="https://github.com/llrs" target="_blank" rel="noopener">@llrs</a>, <a href="https://github.com/LuisLauM" target="_blank" rel="noopener">@LuisLauM</a>, <a href="https://github.com/lynn242" target="_blank" rel="noopener">@lynn242</a>, <a href="https://github.com/makrez" target="_blank" rel="noopener">@makrez</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/michaelgrund" target="_blank" rel="noopener">@michaelgrund</a>, <a href="https://github.com/mikeroswell" target="_blank" rel="noopener">@mikeroswell</a>, <a href="https://github.com/mjsmith037" target="_blank" rel="noopener">@mjsmith037</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/mvanaman" target="_blank" rel="noopener">@mvanaman</a>, <a href="https://github.com/netique" target="_blank" rel="noopener">@netique</a>, <a href="https://github.com/nfancy" target="_blank" rel="noopener">@nfancy</a>, <a href="https://github.com/ngreifer" target="_blank" rel="noopener">@ngreifer</a>, <a href="https://github.com/nkehrein" target="_blank" rel="noopener">@nkehrein</a>, <a href="https://github.com/olobiolo" target="_blank" rel="noopener">@olobiolo</a>, <a href="https://github.com/orgadish" target="_blank" rel="noopener">@orgadish</a>, <a href="https://github.com/pachadotdev" target="_blank" rel="noopener">@pachadotdev</a>, <a href="https://github.com/padpadpadpad" target="_blank" rel="noopener">@padpadpadpad</a>, <a href="https://github.com/paupaiz" target="_blank" rel="noopener">@paupaiz</a>, <a href="https://github.com/ProfessorPeregrine" target="_blank" rel="noopener">@ProfessorPeregrine</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/rikudoukarthik" target="_blank" rel="noopener">@rikudoukarthik</a>, <a href="https://github.com/rjake" target="_blank" rel="noopener">@rjake</a>, <a href="https://github.com/rressler" target="_blank" rel="noopener">@rressler</a>, <a href="https://github.com/SarenT" target="_blank" rel="noopener">@SarenT</a>, <a href="https://github.com/Sebas256" target="_blank" rel="noopener">@Sebas256</a>, <a href="https://github.com/shenzhenzth" target="_blank" rel="noopener">@shenzhenzth</a>, <a href="https://github.com/skyroam" target="_blank" rel="noopener">@skyroam</a>, <a href="https://github.com/stargorg" target="_blank" rel="noopener">@stargorg</a>, <a href="https://github.com/stefanoborini" target="_blank" rel="noopener">@stefanoborini</a>, <a href="https://github.com/steveharoz" target="_blank" rel="noopener">@steveharoz</a>, <a href="https://github.com/stragu" target="_blank" rel="noopener">@stragu</a>, <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>, <a href="https://github.com/tamas-ferenci" target="_blank" rel="noopener">@tamas-ferenci</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, <a href="https://github.com/tfjaeger" target="_blank" rel="noopener">@tfjaeger</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/thoolihan" target="_blank" rel="noopener">@thoolihan</a>, <a href="https://github.com/tjebo" target="_blank" rel="noopener">@tjebo</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/trevorld" target="_blank" rel="noopener">@trevorld</a>, <a href="https://github.com/tungttnguyen" target="_blank" rel="noopener">@tungttnguyen</a>, <a href="https://github.com/twest820" target="_blank" rel="noopener">@twest820</a>, <a href="https://github.com/waynerroper" target="_blank" rel="noopener">@waynerroper</a>, <a href="https://github.com/willgearty" target="_blank" rel="noopener">@willgearty</a>, <a href="https://github.com/wmacnair" target="_blank" rel="noopener">@wmacnair</a>, <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>, <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>, and <a href="https://github.com/zeehio" target="_blank" rel="noopener">@zeehio</a>.</p> Q3 2022 tidymodels digest https://www.tidyverse.org/blog/2022/10/tidymodels-2022-q3/ Wed, 19 Oct 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/10/tidymodels-2022-q3/ <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <p>Since the beginning of 2021, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these from the past month or so:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2022/10/parsnip-checking-1-0-2/" target="_blank" rel="noopener">Improvements to model specification checking in tidymodels</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/09/brulee-0-2-0/" target="_blank" rel="noopener">brulee 0.2.0</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/09/bundle-0-1-0/" target="_blank" rel="noopener">Announcing bundle</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/08/censored-0-1-0/" target="_blank" rel="noopener">censored 0.1.0</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/08/rsample-1-1-0/" target="_blank" rel="noopener">rsample 1.1.0</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/07/tidymodels-2022-q2/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 22 tidymodels packages. Here are links to their NEWS files:</p> <div class="highlight"> <ul> <li>agua <a href="https://agua.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.0)</a></li> <li>applicable <a href="https://github.com/tidymodels/applicable/blob/develop/NEWS.md" target="_blank" rel="noopener">(0.1.0)</a></li> <li>bonsai <a href="https://bonsai.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.0)</a></li> <li>broom <a href="https://broom.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>brulee <a href="https://brulee.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.0)</a></li> <li>butcher <a href="https://butcher.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.3.0)</a></li> <li>censored <a href="https://censored.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.1)</a></li> <li>corrr <a href="https://corrr.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.4.4)</a></li> <li>finetune <a href="https://finetune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>modeldata <a href="https://modeldata.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>modeldb <a href="https://modeldb.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.3)</a></li> <li>parsnip <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>plsmod <a href="https://plsmod.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.0)</a></li> <li>poissonreg <a href="https://poissonreg.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>probably <a href="https://probably.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.1.0)</a></li> <li>recipes <a href="https://recipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.2)</a></li> <li>rsample <a href="https://rsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>spatialsample <a href="https://spatialsample.tidymodels.org/news/index.html" target="_blank" rel="noopener">(0.2.1)</a></li> <li>textrecipes <a href="https://textrecipes.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>tune <a href="https://tune.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.0.1)</a></li> <li>workflows <a href="https://workflows.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> <li>yardstick <a href="https://yardstick.tidymodels.org/news/index.html" target="_blank" rel="noopener">(1.1.0)</a></li> </ul> </div> <p>We&rsquo;ll highlight two specific upgrades: one for agua and another for recipes.</p> <h2 id="a-big-upgrade-for-agua">A big upgrade for agua <a href="#a-big-upgrade-for-agua"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>With version 3.38.0.1 of the <a href="https://cran.r-project.org/package=h2o" target="_blank" rel="noopener">h2o</a> package, agua can now tune h2o models as if they were any other type of model engine.</p> <p> <a href="https://h2o.ai" target="_blank" rel="noopener">h2o</a> has an excellent server-based computational engine for fitting a variety of different machine learning and statistical models. The h2o server can run locally or on some external high performance computing server. The downside is that it is light on tools for feature engineering and interactive data analysis.</p> <p>Using h2o with tidymodels enables users to leverage the benefits of packages like recipes along with fast, server-based parallel processing.</p> <p>While the syntax for model fitting and tuning are the same as any other non-h2o model, there are different ways to parallelize the work:</p> <ul> <li> <p>The h2o server has the ability to internally parallelize individual model computations. For example, when fitting trees, the search for the best split can be done using multiple threads. The number of threads that each model should be used is set with <a href="https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.init.html" target="_blank" rel="noopener">h2o.init(nthreads)</a>. The default (<code>-1</code>) is to use all CPUs on the host.</p> </li> <li> <p>When using grid search, <a href="https://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.grid.html" target="_blank" rel="noopener">h2o.grid(parallelism)</a> determines how many models the h2o server should process at the same time. The default (<code>1</code>) constrains the server to run the models sequentially.</p> </li> <li> <p>R has external parallelization tools (such as the foreach and future packages) that can start new R processes to simultaneously do work. This would run many models in parallel. For h2o, this determines how many models the agua package could send to the server at once. This does not appear to be constrained by the <code>parallelism</code> argument to <code>h2o.grid()</code>.</p> </li> </ul> <p>With h2o and tidymodels, you should probably <strong>use h2o&rsquo;s parallelization</strong>. Using multiple approaches <em>can</em> work but only for some technologies. It&rsquo;s still <a href="https://github.com/topepo/agua-h2o-benchmark" target="_blank" rel="noopener">pretty complicated</a> but we are working on un-complicating it.</p> <p>To set up h2o parallelization, there is a new control argument called <code>backend_options</code>. If you were doing a grid search, you first define how many threads the h2o server should use:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">agua</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">finetune</span><span class="p">)</span> <span class="n">h2o_thread_spec</span> <span class="o">&lt;-</span> <span class="nf">agua_backend_options</span><span class="p">(</span><span class="n">parallelism</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> </code></pre></div><p>Then, pass the output to any of the existing control functions:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">grid_ctrl</span> <span class="o">&lt;-</span> <span class="nf">control_grid</span><span class="p">(</span><span class="n">backend_options</span> <span class="o">=</span> <span class="n">h2o_thread_spec</span><span class="p">)</span> </code></pre></div><p>Now h2o can parallel process 10 models at once.</p> <p>Here is an example using a simulated data set with a numeric outcome:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://agua.tidymodels.org/'>agua</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/finetune'>finetune</a></span><span class='o'>)</span> <span class='c'># Simulate the data</span> <span class='nv'>n_train</span> <span class='o'>&lt;-</span> <span class='m'>200</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>6147</span><span class='o'>)</span> <span class='nv'>sim_dat</span> <span class='o'>&lt;-</span> <span class='nf'>sim_regression</span><span class='o'>(</span><span class='nv'>n_train</span><span class='o'>)</span> <span class='c'># Resample using 10-fold cross-validation</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>91</span><span class='o'>)</span> <span class='nv'>sim_rs</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>sim_dat</span><span class='o'>)</span> </code></pre> </div> <p>We&rsquo;ll use grid search to tune a boosted tree:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>boost_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/boost_tree.html'>boost_tree</a></span><span class='o'>(</span> trees <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, min_n <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, tree_depth <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, learn_rate <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span>, loss_reduction <span class='o'>=</span> <span class='nf'><a href='https://hardhat.tidymodels.org/reference/tune.html'>tune</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"h2o"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"regression"</span><span class='o'>)</span> </code></pre> </div> <p>Now, let&rsquo;s parallel process our computations.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Start the h2o server</span> <span class='nf'>h2o</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/pkg/h2o/man/h2o.init.html'>h2o.init</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'># Multi-thread the model fits</span> <span class='nv'>h2o_thread_spec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://agua.tidymodels.org/reference/h2o_tune.html'>agua_backend_options</a></span><span class='o'>(</span>parallelism <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span> <span class='nv'>grid_ctrl</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tune.tidymodels.org/reference/control_grid.html'>control_grid</a></span><span class='o'>(</span>backend_options <span class='o'>=</span> <span class='nv'>h2o_thread_spec</span><span class='o'>)</span> </code></pre> </div> <p>We&rsquo;ll evaluate a very small grid at first:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>7616</span><span class='o'>)</span> <span class='nv'>grid_res</span> <span class='o'>&lt;-</span> <span class='nv'>boost_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tune.tidymodels.org/reference/tune_grid.html'>tune_grid</a></span><span class='o'>(</span><span class='nv'>outcome</span> <span class='o'>~</span> <span class='nv'>.</span>, resamples <span class='o'>=</span> <span class='nv'>sim_rs</span>, grid <span class='o'>=</span> <span class='m'>10</span>, control <span class='o'>=</span> <span class='nv'>grid_ctrl</span><span class='o'>)</span> </code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tune.tidymodels.org/reference/show_best.html'>show_best</a></span><span class='o'>(</span><span class='nv'>grid_res</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>select</span><span class='o'>(</span><span class='o'>-</span><span class='nv'>.config</span>, <span class='o'>-</span><span class='nv'>.metric</span>, <span class='o'>-</span><span class='nv'>.estimator</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 8</span></span> <span class='c'>#&gt; trees min_n tree_depth learn_rate loss_reduction mean n std_err</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='text-decoration: underline;'>1</span>954 17 13 0.031<span style='text-decoration: underline;'>8</span> 6.08<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-8</span> 13.1 10 0.828</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 184 25 4 0.000<span style='text-decoration: underline;'>000</span>001<span style='text-decoration: underline;'>64</span> 6.56<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-1</span> 15.7 10 1.03 </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='text-decoration: underline;'>1</span>068 10 8 0.000<span style='text-decoration: underline;'>040</span>9 1.19<span style='color: #555555;'>e</span>+1 17.4 10 1.08 </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='text-decoration: underline;'>1</span>500 37 10 0.000<span style='text-decoration: underline;'>010</span>8 9.97<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-9</span> 18.3 10 1.03 </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> 985 18 7 0.000<span style='text-decoration: underline;'>000</span>045<span style='text-decoration: underline;'>4</span> 1.84<span style='color: #555555;'>e</span><span style='color: #BB0000;'>-3</span> 18.4 10 1.04</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>grid_res</span>, metric <span class='o'>=</span> <span class='s'>"rmse"</span><span class='o'>)</span> </code></pre> <p><img src="figs/grid-plot-1.svg" width="90%" style="display: block; margin: auto;" /></p> </div> <p>It was a small grid and most of the configurations were not especially good. We can further optimize the results by applying simulated annealing search to the best grid results.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>sa_ctrl</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://finetune.tidymodels.org/reference/control_sim_anneal.html'>control_sim_anneal</a></span><span class='o'>(</span>backend_options <span class='o'>=</span> <span class='nv'>h2o_thread_spec</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>4</span><span class='o'>)</span> <span class='nv'>sa_res</span> <span class='o'>&lt;-</span> <span class='nv'>boost_spec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://finetune.tidymodels.org/reference/tune_sim_anneal.html'>tune_sim_anneal</a></span><span class='o'>(</span> <span class='nv'>outcome</span> <span class='o'>~</span> <span class='nv'>.</span>, resamples <span class='o'>=</span> <span class='nv'>sim_rs</span>, initial <span class='o'>=</span> <span class='nv'>grid_res</span>, iter <span class='o'>=</span> <span class='m'>25</span>, control <span class='o'>=</span> <span class='nv'>sa_ctrl</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #000000;'>Optimizing rmse</span></span> <span class='c'>#&gt; <span style='color: #000000;'>Initial best: 13.06400</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 1 </span><span style='color: #00BB00;'>♥ new best </span><span style='color: #000000;'> rmse=12.688 (+/-0.7899)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 2 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=12.849 (+/-0.8304)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 3 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=13.129 (+/-0.8266)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 4 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=13.678 (+/-0.9544)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 5 </span><span style='color: #00BB00;'>+ better suboptimal </span><span style='color: #000000;'> rmse=13.433 (+/-0.792)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 6 </span><span style='color: #00BB00;'>+ better suboptimal </span><span style='color: #000000;'> rmse=12.99 (+/-0.9031)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 7 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=16.531 (+/-1.027)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 8 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.522 (+/-0.9802)</span></span> <span class='c'>#&gt; <span style='color: #000000;'> 9 </span><span style='color: #BB0000;'>✖ restart from best </span><span style='color: #000000;'> rmse=13.097 (+/-0.8109)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>10 </span><span style='color: #00BB00;'>♥ new best </span><span style='color: #000000;'> rmse=12.66 (+/-0.8028)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>11 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=13.116 (+/-0.8135)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>12 </span><span style='color: #00BB00;'>+ better suboptimal </span><span style='color: #000000;'> rmse=12.714 (+/-0.7747)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>13 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.074 (+/-0.6598)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>14 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=14.489 (+/-1.028)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>15 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=12.715 (+/-0.8043)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>16 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.788 (+/-1.027)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>17 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.057 (+/-0.7716)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>18 </span><span style='color: #BB0000;'>✖ restart from best </span><span style='color: #000000;'> rmse=13.064 (+/-0.7095)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>19 </span><span style='color: #00BB00;'>♥ new best </span><span style='color: #000000;'> rmse=12.645 (+/-0.7706)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>20 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=12.7 (+/-0.821)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>21 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.018 (+/-0.8047)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>22 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=14.812 (+/-1.017)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>23 </span><span style='color: #BB0000;'>─ discard suboptimal</span><span style='color: #000000;'> rmse=13.098 (+/-0.921)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>24 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=12.708 (+/-0.7538)</span></span> <span class='c'>#&gt; <span style='color: #000000;'>25 </span><span style='color: #BBBB00;'>◯ accept suboptimal </span><span style='color: #000000;'> rmse=13.054 (+/-0.9046)</span></span></code></pre> </div> <p>Again, h2o is doing all of the computational work for fitting models and tidymodels is proposing new parameter configurations.</p> <p>One other nice feature of the new agua release is the h2o engine for the <a href="https://parsnip.tidymodels.org/reference/auto_ml.html" target="_blank" rel="noopener"><code>auto_ml()</code></a> model. This builds a stacked ensemble on a set of different models (not unlike our <a href="https://stacks.tidymodels.org" target="_blank" rel="noopener">stacks</a> package but with far less code).</p> <p>There is a great worked example <a href="https://agua.tidymodels.org/articles/auto_ml.html" target="_blank" rel="noopener">on the agua website</a> so make sure to check this out!</p> <h2 id="more-spline-recipe-steps">More spline recipe steps <a href="#more-spline-recipe-steps"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Spline techniques allow linear models to produce nonlinear model curves. These are called <a href="https://bookdown.org/max/FES/numeric-one-to-many.html#numeric-basis-functions" target="_blank" rel="noopener">basis expansion methods</a> since they take a single numeric predictor and make additional nonlinear feature columns.</p> <p>If you have ever used <code>geom:smooth()</code>, you have probably used a spline function.</p> <p>The recipes package now has an expanded set of spline functions (with a common naming convention):</p> <ul> <li> <a href="https://recipes.tidymodels.org/dev/reference/step_spline_b.html" target="_blank" rel="noopener"><code>step_spline_b()</code></a></li> <li> <a href="https://recipes.tidymodels.org/dev/reference/step_spline_convex.html" target="_blank" rel="noopener"><code>step_spline_convex()</code></a></li> <li> <a href="https://recipes.tidymodels.org/dev/reference/step_spline_monotone.html" target="_blank" rel="noopener"><code>step_spline_monotone()</code></a></li> <li> <a href="https://recipes.tidymodels.org/dev/reference/step_spline_natural.html" target="_blank" rel="noopener"><code>step_spline_natural()</code></a></li> <li> <a href="https://recipes.tidymodels.org/dev/reference/step_spline_nonnegative.html" target="_blank" rel="noopener"><code>step_spline_nonnegative()</code></a></li> </ul> <p>There is also another step to make polynomial functions: <a href="https://recipes.tidymodels.org/dev/reference/step_poly_bernstein.html" target="_blank" rel="noopener"><code>step_poly_bernstein()</code></a></p> <p>These functions take different approaches to creating the new set of features. Take a look at the references to see the technical details.</p> <p>Here is a simple example using the <a href="https://www.tmwr.org/ames.html" target="_blank" rel="noopener">Ames data</a> where we model the sale price as a nonlinear function of the longitude using a convex basis function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>ames</span><span class='o'>)</span> <span class='nv'>ames</span><span class='o'>$</span><span class='nv'>Sale_Price</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/Log.html'>log10</a></span><span class='o'>(</span><span class='nv'>ames</span><span class='o'>$</span><span class='nv'>Sale_Price</span><span class='o'>)</span> <span class='nv'>spline_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>Sale_Price</span> <span class='o'>~</span> <span class='nv'>Longitude</span>, data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>step_spline_convex</span><span class='o'>(</span><span class='nv'>Longitude</span>, deg_free <span class='o'>=</span> <span class='m'>25</span><span class='o'>)</span> <span class='nv'>spline_fit</span> <span class='o'>&lt;-</span> <span class='nv'>spline_rec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>workflow</span><span class='o'>(</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/linear_reg.html'>linear_reg</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span>data <span class='o'>=</span> <span class='nv'>ames</span><span class='o'>)</span> <span class='nv'>spline_fit</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/augment.html'>augment</a></span><span class='o'>(</span><span class='nv'>ames</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>ggplot</span><span class='o'>(</span><span class='nf'>aes</span><span class='o'>(</span><span class='nv'>Longitude</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_point</span><span class='o'>(</span><span class='nf'>aes</span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>Sale_Price</span><span class='o'>)</span>, alpha <span class='o'>=</span> <span class='m'>1</span> <span class='o'>/</span> <span class='m'>3</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_line</span><span class='o'>(</span><span class='nf'>aes</span><span class='o'>(</span>y <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span>, col <span class='o'>=</span> <span class='s'>"red"</span>, lwd <span class='o'>=</span> <span class='m'>1.5</span><span class='o'>)</span> </code></pre> <p><img src="figs/Longitude-1.svg" width="70%" style="display: block; margin: auto;" /></p> </div> <p>Not too bad but the model clearly over-fits on the extreme right tail of the predictor distribution.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>It&rsquo;s important that we thank everyone in the community that contributed to tidymodels:</p> <div class="highlight"> <ul> <li>agua: <a href="https://github.com/coforfe" target="_blank" rel="noopener">@coforfe</a>, <a href="https://github.com/gouthaman87" target="_blank" rel="noopener">@gouthaman87</a>, <a href="https://github.com/jeliason" target="_blank" rel="noopener">@jeliason</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>applicable: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>bonsai: <a href="https://github.com/barrettlayman" target="_blank" rel="noopener">@barrettlayman</a>, <a href="https://github.com/DesmondChoy" target="_blank" rel="noopener">@DesmondChoy</a>, <a href="https://github.com/dfsnow" target="_blank" rel="noopener">@dfsnow</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/jameslamb" target="_blank" rel="noopener">@jameslamb</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>broom: <a href="https://github.com/AaronRendahl" target="_blank" rel="noopener">@AaronRendahl</a>, <a href="https://github.com/AmeliaMN" target="_blank" rel="noopener">@AmeliaMN</a>, <a href="https://github.com/bbolker" target="_blank" rel="noopener">@bbolker</a>, <a href="https://github.com/capnrefsmmat" target="_blank" rel="noopener">@capnrefsmmat</a>, <a href="https://github.com/corybrunson" target="_blank" rel="noopener">@corybrunson</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/friendly" target="_blank" rel="noopener">@friendly</a>, <a href="https://github.com/johanneskoch94" target="_blank" rel="noopener">@johanneskoch94</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>brulee: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>butcher: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>censored: <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/therneau" target="_blank" rel="noopener">@therneau</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>corrr: <a href="https://github.com/jagodap" target="_blank" rel="noopener">@jagodap</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>finetune: <a href="https://github.com/DMozzanica" target="_blank" rel="noopener">@DMozzanica</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>modeldata: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>modeldb: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>parsnip: <a href="https://github.com/barnabywalker" target="_blank" rel="noopener">@barnabywalker</a>, <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/ben-e" target="_blank" rel="noopener">@ben-e</a>, <a href="https://github.com/CarmenCiardiello" target="_blank" rel="noopener">@CarmenCiardiello</a>, <a href="https://github.com/daniel-althoff" target="_blank" rel="noopener">@daniel-althoff</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/gustavomodelli" target="_blank" rel="noopener">@gustavomodelli</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/john-b-edwards" target="_blank" rel="noopener">@john-b-edwards</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mdneuzerling" target="_blank" rel="noopener">@mdneuzerling</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/mrkaye97" target="_blank" rel="noopener">@mrkaye97</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/siegfried" target="_blank" rel="noopener">@siegfried</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/TylerGrantSmith" target="_blank" rel="noopener">@TylerGrantSmith</a>, and <a href="https://github.com/zhaoliang0302" target="_blank" rel="noopener">@zhaoliang0302</a>.</li> <li>plsmod: <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>poissonreg: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</li> <li>probably: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>.</li> <li>recipes: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>, <a href="https://github.com/adisarid" target="_blank" rel="noopener">@adisarid</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/JamesHWade" target="_blank" rel="noopener">@JamesHWade</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/luisDVA" target="_blank" rel="noopener">@luisDVA</a>, <a href="https://github.com/naveranoc" target="_blank" rel="noopener">@naveranoc</a>, <a href="https://github.com/nhward" target="_blank" rel="noopener">@nhward</a>, <a href="https://github.com/RMHogervorst" target="_blank" rel="noopener">@RMHogervorst</a>, <a href="https://github.com/ruddnr" target="_blank" rel="noopener">@ruddnr</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/zhaoliang0302" target="_blank" rel="noopener">@zhaoliang0302</a>.</li> <li>rsample: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/tjmahr" target="_blank" rel="noopener">@tjmahr</a>.</li> <li>spatialsample: <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>.</li> <li>textrecipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, and <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>.</li> <li>tune: <a href="https://github.com/Athospd" target="_blank" rel="noopener">@Athospd</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/frankhezemans" target="_blank" rel="noopener">@frankhezemans</a>, <a href="https://github.com/kevin199011" target="_blank" rel="noopener">@kevin199011</a>, <a href="https://github.com/misken" target="_blank" rel="noopener">@misken</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>workflows: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/JosiahParry" target="_blank" rel="noopener">@JosiahParry</a>, <a href="https://github.com/lrossouw" target="_blank" rel="noopener">@lrossouw</a>, <a href="https://github.com/msberends" target="_blank" rel="noopener">@msberends</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> <li>yardstick: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/leonhGeis" target="_blank" rel="noopener">@leonhGeis</a>, <a href="https://github.com/PriitPaluoja" target="_blank" rel="noopener">@PriitPaluoja</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</li> </ul> </div> tidyselect 1.2.0 https://www.tidyverse.org/blog/2022/10/tidyselect-1-2-0/ Tue, 18 Oct 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/10/tidyselect-1-2-0/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p> <a href="https://tidyselect.r-lib.org/" target="_blank" rel="noopener">tidyselect</a> 1.2.0 hit CRAN last week and includes a few updates to the syntax of selections in tidyverse functions like <code>dplyr::select(...)</code> and <code>tidyr::pivot_longer(cols = )</code>.</p> <p>tidyselect is a low-level package that provides the backend for selection contexts in tidyverse functions. A selection context is an argument like <code>cols</code> in <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a> or a set of arguments like <code>...</code> in <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a> <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. In these special contexts, you can use a domain specific language that helps you create a selection of columns. For example, you can select multiple columns with <a href="https://rdrr.io/r/base/c.html" target="_blank" rel="noopener"><code>c()</code></a>, a range of columns with <code>:</code>, and complex matches with selection helpers such as <a href="https://tidyselect.r-lib.org/reference/starts_with.html" target="_blank" rel="noopener"><code>starts_with()</code></a>. Under the hood, this selection syntax is interpreted and processed by the tidyselect package.</p> <p>In this post, we&rsquo;ll cover the most important <a href="https://lifecycle.r-lib.org/articles/stages.html" target="_blank" rel="noopener">lifecycle changes</a> in the selection syntax that tidyverse users (package developers in particular) should know about. You can see a full list of changes in the <a href="https://tidyselect.r-lib.org/news/index.html#tidyselect-120" target="_blank" rel="noopener">release notes</a>. We&rsquo;ll start with a quick recap of what it means in practice for a feature to be deprecated or soft-deprecated.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyverse.tidyverse.org'>tidyverse</a></span><span class='o'>)</span></span></code></pre> </div> <p>Note: With this release of tidyselect, some error messages will be suboptimal until dplyr 1.1.0 is released (planned in late October). We recommend waiting until then before updating tidyselect (though it&rsquo;s not a big deal if you have already updated).</p> <h2 id="about-soft-deprecation">About soft-deprecation <a href="#about-soft-deprecation"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Deprecation of features in tidyverse packages is handled by the lifecycle package. See <a href="https://www.tidyverse.org/blog/2021/02/lifecycle-1-0-0/">https://www.tidyverse.org/blog/2021/02/lifecycle-1-0-0/</a> for an introduction.</p> <p>The main feature of lifecycle is to distinguish between two stages of deprecation and two usage modes, direct and indirect.</p> <ul> <li> <p>For script users, <strong>direct usage</strong> is when you use a deprecated feature from the global environment. If the deprecated feature was used inside a package function that you are calling, it is considered <strong>indirect usage</strong>.</p> </li> <li> <p>For package developers, the distinction between direct and indirect usages is made by testthat in unit tests. If a function in your package calls the feature, it is considered direct usage. If that&rsquo;s a function in another package that you are calling, it&rsquo;s indirect usage.</p> </li> </ul> <p>To sum up, direct usage is when your own code uses the deprecated feature, and indirect usage is when someone else&rsquo;s code uses it. This distinction matters because it determines how verbose (and thus how annoying) the deprecation warnings are.</p> <ul> <li> <p>For <strong>soft-deprecation</strong>, indirect usage is always silent because we only want to alert people who are actually able to fix the problem.</p> <p>Direct usage only generates one warning every 8 hours to avoid being too annoying during this transition period, so that you can continue to work with existing code, ignore the warnings, and update to the new patterns on your own time.</p> </li> <li> <p>For <strong>deprecation</strong>, it&rsquo;s now really time to update the code. Direct usage gives a warning every time so that deprecated features can no longer be ignored.</p> <p>Indirect usage will now also warn, but only once every 8 hours since indirect users can&rsquo;t fix the problem themselves. The warning message automatically picks up the package URL where the usage was detected so that you can easily report the deprecation to the relevant maintainers.</p> </li> </ul> <p>lifecycle warnings are set up to helpfully inform you about upcoming changes while being as discreet as possible. All of the features deprecated in tidyselect in this blog post are in the <strong>soft-deprecation</strong> stage, and will remain this way for at least one year.</p> <h2 id="supplying-character-vectors-of-column-names-outside-of-all_of-and-any_of">Supplying character vectors of column names outside of <code>all_of()</code> and <code>any_of()</code> <a href="#supplying-character-vectors-of-column-names-outside-of-all_of-and-any_of"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To specify a column selection using a character vector of names, you normally use <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a> or <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>any_of()</code></a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>vars</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"cyl"</span>, <span class='s'>"am"</span><span class='o'>)</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>all_of</a></span><span class='o'>(</span><span class='nv'>vars</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 2</span></span> <span><span class='c'>#&gt; $ cyl <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4…</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1…</span></span></code></pre> </div> <p> <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a> is adamant that it <em>must</em> select all of the requested columns:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>all_of</a></span><span class='o'>(</span><span class='nv'>letters</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `select()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> In argument: `all_of(letters)`.</span></span> <span><span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in `all_of()`:</span></span></span> <span><span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Can't subset elements that don't exist.</span></span> <span><span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> Elements `a`, `b`, `c`, `d`, `e`, etc. don't exist.</span></span></code></pre> </div> <p> <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>any_of()</code></a> is more lenient and ignores any names that are not present in the data frame. In this case, it ends up selecting nothing:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>any_of</a></span><span class='o'>(</span><span class='nv'>letters</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='c'>#&gt; data frame with 0 columns and 32 rows</span></span></code></pre> </div> <p>Another feature of <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a> and <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>any_of()</code></a> is that they remove all ambiguity between variables in your environment like <code>vars</code> or <code>letters</code> (env-variables) and variables inside the data frame like <code>cyl</code> or <code>am</code> (data-variables). Let&rsquo;s add <code>vars</code> in the data frame to see what happens:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>my_data</span> <span class='o'>&lt;-</span> <span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>vars <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/context.html'>n</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span></span> <span><span class='nv'>my_data</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>all_of</a></span><span class='o'>(</span><span class='nv'>vars</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 2</span></span> <span><span class='c'>#&gt; $ cyl <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4…</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1…</span></span></code></pre> </div> <p>Because <code>vars</code> was supplied to <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a>, <a href="https://dplyr.tidyverse.org/reference/select.html" target="_blank" rel="noopener"><code>select()</code></a> will never confuse it with <code>mtcars$vars</code>. In technical terms, there is no <strong>data-masking</strong> within selection helpers like <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a>, <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>any_of()</code></a>, or even <a href="https://tidyselect.r-lib.org/reference/starts_with.html" target="_blank" rel="noopener"><code>starts_with()</code></a>. It is safe to supply env-variables to these functions without worrying about data-masking ambiguity.</p> <p>This is not the case however if you supply a character vector outside of <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>my_data</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>vars</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 1</span></span> <span><span class='c'>#&gt; $ vars <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…</span></span></code></pre> </div> <p>This is why we have decided to deprecate direct supply of character vectors in favour of using <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a> and <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>any_of()</code></a>. You will now get a soft-deprecation warning recommending to use <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>vars</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `all_of()` or `any_of()` instead.</span></span> <span><span class='c'>#&gt; # Was:</span></span> <span><span class='c'>#&gt; data %&gt;% select(vars)</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; # Now:</span></span> <span><span class='c'>#&gt; data %&gt;% select(all_of(vars))</span></span> <span><span class='c'>#&gt; </span></span> <span><span class='c'>#&gt; See &lt;https://tidyselect.r-lib.org/reference/faq-external-vector.html&gt;.</span></span><span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 2</span></span> <span><span class='c'>#&gt; $ cyl <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4…</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1…</span></span></code></pre> </div> <h2 id="using-data-inside-selections">Using <code>.data</code> inside selections <a href="#using-data-inside-selections"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The <code>.data</code> pronoun is a convenient way of programming with data-masking functions like <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> and <a href="https://dplyr.tidyverse.org/reference/filter.html" target="_blank" rel="noopener"><code>filter()</code></a>. It has two main functions:</p> <ol> <li> <p>Retrieve a data frame column from a name stored in a variable with <code>[[</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>var</span> <span class='o'>&lt;-</span> <span class='s'>"am"</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/transmute.html'>transmute</a></span><span class='o'>(</span>am <span class='o'>=</span> <span class='nv'>.data</span><span class='o'>[[</span><span class='nv'>var</span><span class='o'>]</span><span class='o'>]</span> <span class='o'>*</span> <span class='m'>10</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 1</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10…</span></span></code></pre> </div> </li> <li> <p>For package developers, <code>.data</code> is helpful to silence R CMD check notes about unknown variables. When the static analysis checker of R encounters an expression like <code>mtcars |&gt; mutate(am * 2)</code>, it has no way of knowing that <code>am</code> is a data frame variable. Since it doesn&rsquo;t see any variable <code>am</code> in your environment, it emits a warning about a potential typo in the code.</p> <p>The <code>.data$col</code> pattern is used to work around this issue: <code>mtcars |&gt; mutate(.data$am * 2)</code> doesn&rsquo;t produce any warnings.</p> </li> </ol> <p>Whereas <code>.data</code> is very useful in data-masking functions, its usage in selections is much more limited. As we have seen in the previous section, retrieving a variable from character vector should be done with <a href="https://tidyselect.r-lib.org/reference/all_of.html" target="_blank" rel="noopener"><code>all_of()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>var</span> <span class='o'>&lt;-</span> <span class='s'>"am"</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nf'><a href='https://tidyselect.r-lib.org/reference/all_of.html'>all_of</a></span><span class='o'>(</span><span class='nv'>var</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 1</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,…</span></span></code></pre> </div> <p>And to avoid the R CMD check note about unknown variables, it is much cleaner to wrap the column name in quotes:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='s'>"am"</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://pillar.r-lib.org/reference/glimpse.html'>glimpse</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Rows: 32</span></span> <span><span class='c'>#&gt; Columns: 1</span></span> <span><span class='c'>#&gt; $ am <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,…</span></span></code></pre> </div> <p>Allowing the <code>.data</code> pronoun in selection contexts also makes the distinction between tidy-selections and data-masking blurrier. And so we have decided to deprecate it in selections:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>var</span> <span class='o'>&lt;-</span> <span class='s'>"am"</span></span> <span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>.data</span><span class='o'>[[</span><span class='nv'>var</span><span class='o'>]</span><span class='o'>]</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/base/invisible.html'>invisible</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `all_of(var)` (or `any_of(var)`) instead of `.data[[var]]`</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nv'>mtcars</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/select.html'>select</a></span><span class='o'>(</span><span class='nv'>.data</span><span class='o'>$</span><span class='nv'>am</span><span class='o'>)</span> <span class='o'>|&gt;</span> <span class='nf'><a href='https://rdrr.io/r/base/invisible.html'>invisible</a></span><span class='o'>(</span><span class='o'>)</span></span> <span><span class='c'>#&gt; Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.</span></span> <span><span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use `"am"` instead of `.data$am`</span></span></code></pre> </div> <p>This deprecation does not affect the use of <code>.data</code> in data-masking contexts.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Many thanks to all contributors (issues and PRs) to this release!</p> <p> <a href="https://github.com/alexpghayes" target="_blank" rel="noopener">@alexpghayes</a>, <a href="https://github.com/angela-li" target="_blank" rel="noopener">@angela-li</a>, <a href="https://github.com/apreshill" target="_blank" rel="noopener">@apreshill</a>, <a href="https://github.com/arneschillert" target="_blank" rel="noopener">@arneschillert</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/behrman" target="_blank" rel="noopener">@behrman</a>, <a href="https://github.com/bensoltoff" target="_blank" rel="noopener">@bensoltoff</a>, <a href="https://github.com/braceandbracket" target="_blank" rel="noopener">@braceandbracket</a>, <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>, <a href="https://github.com/bwalsh5" target="_blank" rel="noopener">@bwalsh5</a>, <a href="https://github.com/carneybill" target="_blank" rel="noopener">@carneybill</a>, <a href="https://github.com/ChrisDunleavy" target="_blank" rel="noopener">@ChrisDunleavy</a>, <a href="https://github.com/ColinFay" target="_blank" rel="noopener">@ColinFay</a>, <a href="https://github.com/courtiol" target="_blank" rel="noopener">@courtiol</a>, <a href="https://github.com/csgillespie" target="_blank" rel="noopener">@csgillespie</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dgrtwo" target="_blank" rel="noopener">@dgrtwo</a>, <a href="https://github.com/DivadNojnarg" target="_blank" rel="noopener">@DivadNojnarg</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/dpseidel" target="_blank" rel="noopener">@dpseidel</a>, <a href="https://github.com/drmowinckels" target="_blank" rel="noopener">@drmowinckels</a>, <a href="https://github.com/dylan-cooper" target="_blank" rel="noopener">@dylan-cooper</a>, <a href="https://github.com/EconomiCurtis" target="_blank" rel="noopener">@EconomiCurtis</a>, <a href="https://github.com/edgararuiz-zz" target="_blank" rel="noopener">@edgararuiz-zz</a>, <a href="https://github.com/EdwinTh" target="_blank" rel="noopener">@EdwinTh</a>, <a href="https://github.com/elben10" target="_blank" rel="noopener">@elben10</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/espinielli" target="_blank" rel="noopener">@espinielli</a>, <a href="https://github.com/fenguoerbian" target="_blank" rel="noopener">@fenguoerbian</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/giocomai" target="_blank" rel="noopener">@giocomai</a>, <a href="https://github.com/gregrs-uk" target="_blank" rel="noopener">@gregrs-uk</a>, <a href="https://github.com/gregswinehart" target="_blank" rel="noopener">@gregswinehart</a>, <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/hplieninger" target="_blank" rel="noopener">@hplieninger</a>, <a href="https://github.com/ismayc" target="_blank" rel="noopener">@ismayc</a>, <a href="https://github.com/jameslairdsmith" target="_blank" rel="noopener">@jameslairdsmith</a>, <a href="https://github.com/jayhesselberth" target="_blank" rel="noopener">@jayhesselberth</a>, <a href="https://github.com/jemus42" target="_blank" rel="noopener">@jemus42</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/justmytwospence" target="_blank" rel="noopener">@justmytwospence</a>, <a href="https://github.com/karawoo" target="_blank" rel="noopener">@karawoo</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/leafyoung" target="_blank" rel="noopener">@leafyoung</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/LucyMcGowan" target="_blank" rel="noopener">@LucyMcGowan</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/markdly" target="_blank" rel="noopener">@markdly</a>, <a href="https://github.com/martin-ueding" target="_blank" rel="noopener">@martin-ueding</a>, <a href="https://github.com/maurolepore" target="_blank" rel="noopener">@maurolepore</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/mitchelloharawild" target="_blank" rel="noopener">@mitchelloharawild</a>, <a href="https://github.com/pkq" target="_blank" rel="noopener">@pkq</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/rgerecke" target="_blank" rel="noopener">@rgerecke</a>, <a href="https://github.com/richierocks" target="_blank" rel="noopener">@richierocks</a>, <a href="https://github.com/Robinlovelace" target="_blank" rel="noopener">@Robinlovelace</a>, <a href="https://github.com/robinsones" target="_blank" rel="noopener">@robinsones</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/rosseji" target="_blank" rel="noopener">@rosseji</a>, <a href="https://github.com/rudeboybert" target="_blank" rel="noopener">@rudeboybert</a>, <a href="https://github.com/saghirb" target="_blank" rel="noopener">@saghirb</a>, <a href="https://github.com/sbearrows" target="_blank" rel="noopener">@sbearrows</a>, <a href="https://github.com/sharlagelfand" target="_blank" rel="noopener">@sharlagelfand</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/stedy" target="_blank" rel="noopener">@stedy</a>, <a href="https://github.com/stephlocke" target="_blank" rel="noopener">@stephlocke</a>, <a href="https://github.com/stragu" target="_blank" rel="noopener">@stragu</a>, <a href="https://github.com/sysilviakim" target="_blank" rel="noopener">@sysilviakim</a>, <a href="https://github.com/thisisdaryn" target="_blank" rel="noopener">@thisisdaryn</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/thuettel" target="_blank" rel="noopener">@thuettel</a>, <a href="https://github.com/tmstauss" target="_blank" rel="noopener">@tmstauss</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/tracykteal" target="_blank" rel="noopener">@tracykteal</a>, <a href="https://github.com/tyluRp" target="_blank" rel="noopener">@tyluRp</a>, <a href="https://github.com/vspinu" target="_blank" rel="noopener">@vspinu</a>, <a href="https://github.com/warint" target="_blank" rel="noopener">@warint</a>, <a href="https://github.com/wibeasley" target="_blank" rel="noopener">@wibeasley</a>, <a href="https://github.com/yitao-li" target="_blank" rel="noopener">@yitao-li</a>, and <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>If you are wondering whether a particular argument supports selections, look in the function documentation. Arguments tagged with <code>&lt;tidy-select&gt;</code> implement the selection dialect. By contrast, arguments tagged with <code>&lt;data-masking&gt;</code> only allow to refer to data frame columns directly. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> Improvements to model specification checking in tidymodels https://www.tidyverse.org/blog/2022/10/parsnip-checking-1-0-2/ Mon, 03 Oct 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/10/parsnip-checking-1-0-2/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re stoked to announce the new release of <a href="https://parsnip.tidymodels.org/" target="_blank" rel="noopener">parsnip</a> v1.0.2 on CRAN! parsnip provides a tidy, unified interface to various statistical and machine learning models. This release includes improvements to errors and warnings that proliferate throughout the tidymodels ecosystem. These changes are meant to better anticipate common mistakes and nudge users informatively when defining model specifications. parsnip v1.0.2 includes a number of other changes that you can read about in the <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-102" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="parsnip-and-its-extension-packages">parsnip and its extension packages <a href="#parsnip-and-its-extension-packages"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ll load parsnip, along with other core packages in tidymodels, using the tidymodels meta-package:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> </code></pre></div><p>parsnip provides a unified interface to machine learning models, supporting a wide array of modeling approaches implemented across numerous R packages. For instance, the code to specify a linear regression model using the <code>glmnet</code> package:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">linear_reg</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;glmnet&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="c1">#&gt; Linear Regression Model Specification (regression)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: glmnet</span> </code></pre></div><p>&hellip;is quite similar to that needed to specify a boosted tree regression model using <code>xgboost</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">boost_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;xgboost&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="c1">#&gt; Boosted Tree Model Specification (regression)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: xgboost</span> </code></pre></div><p>We refer to these objects as <em>model specifications</em>. They have three main components:</p> <ul> <li>The <strong>model type</strong>: In this case, a linear regression or boosted tree.</li> <li>The <strong>mode</strong>: The learning task, such as regression or classification.</li> <li>The <strong>engine</strong>: The implementation for the given model type and mode, usually an R package.</li> </ul> <p>This conceptual split of the model specification allows for parsnip&rsquo;s consistent syntax - and it makes it extensible. Anyone (including you!) can write a parsnip <em>extension package</em> that tightly integrates with other tidymodels packages out-of-the-box. We maintain a few of these packages ourselves, such as:</p> <ul> <li> <a href="https://github.com/tidymodels/agua" target="_blank" rel="noopener">agua</a>: models from the H2O modeling ecosystem</li> <li> <a href="https://github.com/tidymodels/baguette" target="_blank" rel="noopener">baguette</a>: bootstrap aggregating ensemble models</li> <li> <a href="https://github.com/tidymodels/censored" target="_blank" rel="noopener">censored</a>: censored regression and survival analysis</li> </ul> <p>Similarly, community members outside of the tidymodels team have written parsnip extension packages, such as:</p> <ul> <li> <a href="https://github.com/business-science/modeltime" target="_blank" rel="noopener">modeltime</a>: time series forecasting</li> <li> <a href="https://github.com/hsbadr/additive" target="_blank" rel="noopener">additive</a>: generalized additive models</li> </ul> <p>Much of our work on improving errors and warnings in this release has focused on parsnip&rsquo;s integration with its extensions.</p> <h2 id="improvements-to-errors-and-warnings">Improvements to errors and warnings <a href="#improvements-to-errors-and-warnings"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Two &ldquo;big ideas&rdquo; have helped us focus our efforts related to improving errors and messages in the ecosystem.</p> <ul> <li>The same kind of mistake should raise the same prompt.</li> <li>Don&rsquo;t tell the user they did something they didn&rsquo;t do.</li> </ul> <p>We&rsquo;ll address both in the sections below!</p> <h3 id="the-same-kind-of-mistake-should-raise-the-same-prompt">The same kind of mistake should raise the same prompt <a href="#the-same-kind-of-mistake-should-raise-the-same-prompt"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The first problem we sought to address with these changes is that, in some cases, the same conceptual mistake could lead to different kinds of errors from parsnip and the packages that depend on it.</p> <p>A common mistake that users (and we, as developers) make when defining model specifications is forgetting to load the needed extension package for a given model specification.</p> <p>For example, parsnip supports bagged decision tree models via the <code>bag_tree()</code> model type, though requires extension packages for actual implementations of the model. The censored package implements the <code>censored regression</code> mode for bagged decision trees via <code>rpart</code>, and the baguette package implements a few additional engines for <code>regression</code> and <code>classification</code> with this model type.</p> <p>In parsnip v1.0.1, if we specified a <code>bag_tree()</code> model without loading any extension packages, we&rsquo;d see:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">bt</span> <span class="o">&lt;-</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="n">bt</span> <span class="c1">#&gt; parsnip could not locate an implementation for `bag_tree` model specifications</span> <span class="c1">#&gt; using the `rpart` engine.</span> <span class="c1">#&gt;</span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>After seeing this prompt, we may not remember which extension package was the one that implemented this specification. A reasonable guess might be the censored package:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">censored</span><span class="p">)</span> <span class="c1">#&gt; Loading required package: survival</span> </code></pre></div><p>Then, trying again:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="c1">#&gt; Error in `stop_incompatible_mode()`:</span> <span class="c1">#&gt; ! Available modes for engine rpart are: &#39;unknown&#39;, &#39;censored regression&#39;</span> </code></pre></div><p>The censored package clearly wasn&rsquo;t the right one to load. Strangely, though, a side effect of loading it was that the prompt then became more cryptic, and it was converted from a message to an error. Perhaps even more strangely, if we instead supply an engine that only has an implementation in baguette and not censored, we see a different error:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;C5.0&#34;</span><span class="p">)</span> <span class="c1">#&gt; Error in `check_spec_mode_engine_val()`:</span> <span class="c1">#&gt; ! Engine &#39;C5.0&#39; is not supported for `bag_tree()`. See `show_engines(&#39;bag_tree&#39;)`.</span> </code></pre></div><p>Not only is this error different from the one above, but it seems to suggest that there is literally no <code>C5.0</code> implementation anywhere.</p> <p>Returning to our <code>bt</code> object, suppose we moved forward with defining tuning parameters, and want to define the grid to optimize over:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">bt</span> <span class="o">&lt;-</span> <span class="n">bt</span> <span class="o">%&gt;%</span> <span class="nf">update</span><span class="p">(</span><span class="n">cost_complexity</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span> <span class="nf">extract_parameter_set_dials</span><span class="p">(</span><span class="n">bt</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">grid_random</span><span class="p">(</span><span class="n">size</span> <span class="o">=</span> <span class="m">3</span><span class="p">)</span> <span class="c1">#&gt; Error in `grid_random()`:</span> <span class="c1">#&gt; ! At least one parameter object is required.</span> </code></pre></div><p>So far in this section, we&rsquo;ve made the same mistake&mdash;failing to load the needed parsnip extension package&mdash;four times, and received four different prompts.</p> <p>The good news is that, in each of the above cases, the newest version of parsnip always supplies a message, <em>and</em> it&rsquo;s the same kind of message, <em>and</em> it&rsquo;s much more helpful.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="c1">#&gt; ! parsnip could not locate an implementation for `bag_tree` model</span> <span class="c1">#&gt; specifications using the `rpart` engine.</span> <span class="c1">#&gt; ℹ The parsnip extension packages censored and baguette implement support for</span> <span class="c1">#&gt; this specification.</span> <span class="c1">#&gt; ℹ Please install (if needed) and load to continue.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown mode)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>Note how the above message now suggests the two possible parsnip extensions that could provide support for this model specification.</p> <p>We could load censored, and then this specification is possible; censored implements a <code>censored regression</code> mode for bagged trees:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">censored</span><span class="p">)</span> <span class="c1">#&gt; Loading required package: survival</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown mode)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>The censored package, however, doesn&rsquo;t implement a <code>regression</code> mode for bagged trees. Thus, if we set the mode to <code>regression</code> but fail to load the package that provides support for that mode, parsnip will again prompt us to load the correct package:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="c1">#&gt; ! parsnip could not locate an implementation for `bag_tree` regression model</span> <span class="c1">#&gt; specifications using the `rpart` engine.</span> <span class="c1">#&gt; ℹ The parsnip extension package baguette implements support for this</span> <span class="c1">#&gt; specification.</span> <span class="c1">#&gt; ℹ Please install (if needed) and load to continue.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (regression)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>That side-effect of loading censored is no longer the case for the <code>C5.0</code> engine, as well:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;C5.0&#34;</span><span class="p">)</span> <span class="c1">#&gt; ! parsnip could not locate an implementation for `bag_tree` model</span> <span class="c1">#&gt; specifications using the `C5.0` engine.</span> <span class="c1">#&gt; ℹ The parsnip extension package baguette implements support for this</span> <span class="c1">#&gt; specification.</span> <span class="c1">#&gt; ℹ Please install (if needed) and load to continue.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown mode)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: C5.0</span> </code></pre></div><p>Finally, if we try to extract information about tuning parameters for a model that&rsquo;s not well-specified with parsnip v1.0.2, the message about missing extensions is elevated to an error:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">bt</span> <span class="o">&lt;-</span> <span class="n">bt</span> <span class="o">%&gt;%</span> <span class="nf">update</span><span class="p">(</span><span class="n">cost_complexity</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span> <span class="nf">extract_parameter_set_dials</span><span class="p">(</span><span class="n">bt</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">grid_random</span><span class="p">(</span><span class="n">size</span> <span class="o">=</span> <span class="m">3</span><span class="p">)</span> <span class="c1">#&gt; Error:</span> <span class="c1">#&gt; ! parsnip could not locate an implementation for `bag_tree` regression</span> <span class="c1">#&gt; model specifications using the `rpart` engine.</span> <span class="c1">#&gt; ℹ The parsnip extension package baguette implements support for this</span> <span class="c1">#&gt; specification.</span> <span class="c1">#&gt; ℹ Please install (if needed) and load to continue.</span> </code></pre></div><p>Given parsnip&rsquo;s infrastructure, the technical conditions that raise these four prompts are quite different, but <em>the technical reasons don&rsquo;t matter</em>; the mistake being made is the same, and that&rsquo;s what ought to determine the prompt raised.</p> <h3 id="dont-tell-the-user-they-did-something-they-didnt-do">Don&rsquo;t tell the user they did something they didn&rsquo;t do <a href="#dont-tell-the-user-they-did-something-they-didnt-do"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Another consideration that helped us frame these changes is that we feel error messages shouldn&rsquo;t reference operations that users don&rsquo;t need to know about. We&rsquo;ll return to the example of forgetting to load extension packages to elaborate on what we mean here.</p> <p>With parsnip v1.0.1, if we just load the package and initialize a <code>bag_tree()</code> model, we see:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="c1">#&gt; parsnip could not locate an implementation for `bag_tree` model specifications</span> <span class="c1">#&gt; using the `rpart` engine.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>Note the ending of the message: &ldquo;&hellip;using the <code>rpart</code> engine.&rdquo; We didn&rsquo;t specify that we wanted to use <code>rpart</code> as an engine, yet that seems to be what went wrong!</p> <p>Readers who have fitted bagged decision tree models with parsnip before may realize that <code>rpart</code> is the default engine for these models. This shouldn&rsquo;t be requisite knowledge to interpret this message, though, and is not helpful in addressing the issue. With v1.0.2, we only mention the information that users give to us when constructing that message, and tell them exactly which packages they might need to install/load:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="c1">#&gt; ! parsnip could not locate an implementation for `bag_tree` model</span> <span class="c1">#&gt; specifications.</span> <span class="c1">#&gt; ℹ The parsnip extension packages censored and baguette implement support for</span> <span class="c1">#&gt; this specification.</span> <span class="c1">#&gt; ℹ Please install (if needed) and load to continue.</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Bagged Decision Tree Model Specification (unknown mode)</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Main Arguments:</span> <span class="c1">#&gt; cost_complexity = 0</span> <span class="c1">#&gt; min_n = 2</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; Computational engine: rpart</span> </code></pre></div><p>We hinted at another example of this guideline in the previous section; parsnip shouldn&rsquo;t refer to internal functions when it raises error messages. Above, with parsnip v1.0.1, we saw:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">censored</span><span class="p">)</span> <span class="c1">#&gt; Loading required package: survival</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="c1">#&gt; Error in `stop_incompatible_mode()`:</span> <span class="c1">#&gt; ! Available modes for engine rpart are: &#39;unknown&#39;, &#39;censored regression&#39;</span> </code></pre></div><p>The error points out a function called <code>stop_incompatible_mode()</code>, which is a function used internally by parsnip to check modes. There&rsquo;s a different function, <code>check_spec_mode_engine_val()</code>, that will flag super silly modes:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span> <span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;beep bop boop&#34;</span><span class="p">)</span> <span class="c1">#&gt; Error in `check_spec_mode_engine_val()`:</span> <span class="c1">#&gt; ! &#39;beep bop boop&#39; is not a known mode for model `bag_tree()`.</span> </code></pre></div><p>The important part, though, is that <em>the technical reasons don&rsquo;t matter</em>. Users don&rsquo;t know&mdash;and don&rsquo;t need to know&mdash;what <code>stop_incompatible_mode()</code> or <code>check_spec_mode_engine_val()</code> do.</p> <p>In parsnip v1.0.2, we now point users to the function they actually called that eventually gave rise to the error:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">bag_tree</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;rpart&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;beep bop boop&#34;</span><span class="p">)</span> <span class="c1">#&gt; Error in `set_mode()`:</span> <span class="c1">#&gt; ! &#39;beep bop boop&#39; is not a known mode for model `bag_tree()`.</span> </code></pre></div><p>We hope these changes improve folks&rsquo; experience when modeling with parsnip in the future!</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><!-- This post has highlighted upcoming improvements to model specification checking in parsnip. For those who'd like to learn more, I've written a [companion article](https://simonpcouch.com/blog) on my blog that delves further into the tooling we use to check model specifications. --> <p>Thanks to the folks who have contributed to this release of parsnip via GitHub: <a href="https://github.com/gustavomodelli" target="_blank" rel="noopener">@gustavomodelli</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/mrkaye97" target="_blank" rel="noopener">@mrkaye97</a>, <a href="https://github.com/siegfried" target="_blank" rel="noopener">@siegfried</a>.</p> <p>Contributions from many others, in the form of StackOverflow and RStudio Community posts, have been greatly helpful in our work on these improvements.</p> Playing on the same team as your dependency https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/ Thu, 29 Sep 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/09/playing-on-the-same-team-as-your-dependecy/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>Developing packages for R is a matter of standing on the shoulders of others. Very seldom does packages exist in a vacuum &mdash; on the contrary, we often rely on dependencies to avoid duplication of code or lean into the work done by experts within an adjacent field.</p> <p>It can easily feel like a one-way relationship to take on a dependency of another package. You are responsible for keeping your package working and the developer of the dependency can ignore whatever goes on in your package. Code flows only from the dependency to your package. This is not true, though. By taking on a dependency you enter into a mutual relationship with it. The dependency implicitly promises not to change its interface without providing an upgrade path to you. You, on the other hand, promises to only rely on the public interface of the package. This blog post goes into detail as to what your promise entails.</p> <h3 id="why-does-this-matter">Why does this matter? <a href="#why-does-this-matter"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>As a developer, you may be surprised to learn that the dependency&rsquo;s promise is enforced by CRAN. When submitting a new version for release, the package goes through a battery of tests, including a reverse dependency check where all packages on CRAN that depend on the submitted package are checked against the new version. If any regressions have occurred, it is flagged. The CRAN repository policy states:</p> <blockquote> <p>If an update will change the package&rsquo;s API and hence affect packages depending on it, it is expected that you will contact the maintainers of affected packages and suggest changes, and give them time (at least 2 weeks, ideally more) to prepare updates before submitting your updated package.</p> </blockquote> <p>This is good in general &mdash; it <em>is</em> important that a package maintains a stable interface across versions &mdash; but can become a huge obstacle to updates if the packages that depends on you are reaching behind the curtain and making assumptions you never promised to adhere to.</p> <h2 id="whats-in-an-api">What&rsquo;s in an API? <a href="#whats-in-an-api"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For better and worse, R as a language is extremely liberal with what you can access as a user. There is practically no data or function you can&rsquo;t access and modify, which makes the concept of APIs a question of conventions. Those conventions are quite well defined when it comes to functions in packages, but much less so for everything else. We will discuss functions first, and then proceed into the more gray areas of objects and data.</p> <h3 id="exported-functions">Exported functions <a href="#exported-functions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When creating a package, you are required to provide a NAMESPACE file which states the functions you import <em>into</em> your package for use, and the functions you export <em>out of</em> your package for others to use. The NAMESPACE file demarcates in very clear terms the functional interface of a package, but is still based on mutual trust. While you cannot import functions from a package that have not been exported, there is nothing in the R language that prevents you from using them by accessing them directly. Below, we will talk about several ways of doing this and why each of them has issues:</p> <h4 id="using-">Using <code>:::</code> <a href="#using-"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>R provides two operators for accessing objects in a package namespace: <code>::</code> allows you to fetch exported objects and functions, while <code>:::</code> allows you to access <em>any</em> object and function (both public and internal). Thus, you could for instance gain access to the internal <code>camelize()</code> function in ggplot2 to convert geom function names into ggproto object names like so:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>ggplot2</span><span class='nf'>:::</span><span class='nf'>camelize</span><span class='o'>(</span><span class='s'>'geom_point'</span>, first <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='c'>#&gt; [1] "GeomPoint"</span></code></pre> </div> <p><em>But</em> since there is no need for <code>:::</code> except for reaching beyond the package interface, its use is actively checked and packages using it are rejected from CRAN.</p> <h4 id="using-utilsgetfromnamespace">Using <code>utils::getFromNamespace()</code> <a href="#using-utilsgetfromnamespace"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>To circumvent the detection of <code>:::</code>, we sometimes see code like the following in packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>camelize</span> <span class='o'>&lt;-</span> <span class='nf'>utils</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/utils/getFromNamespace.html'>getFromNamespace</a></span><span class='o'>(</span><span class='s'>"camelize"</span>, <span class='s'>"ggplot2"</span><span class='o'>)</span></code></pre> </div> <p>There are two huge issues with this approach. The first being that you now have sneakily accessed something that was never meant for public consumption (this is a general theme). The second is that you are grabbing a function from another package <em>at build time</em>. This means that the <code>camelize()</code> function living in your package is the one from the ggplot2 version available when your package got build on CRAN. Why is that a problem? Consider again <code>camelize()</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>ggplot2</span><span class='nf'>:::</span><span class='nv'>camelize</span> <span class='c'>#&gt; function(x, first = FALSE) &#123;</span> <span class='c'>#&gt; x &lt;- gsub("_(.)", "\\U\\1", x, perl = TRUE)</span> <span class='c'>#&gt; if (first) x &lt;- firstUpper(x)</span> <span class='c'>#&gt; x</span> <span class='c'>#&gt; &#125;</span> <span class='c'>#&gt; &lt;bytecode: 0x106658a98&gt;</span> <span class='c'>#&gt; &lt;environment: namespace:ggplot2&gt;</span></code></pre> </div> <p>We can see that it contains a call to <code>firstUpper()</code> which is another internal function. As ggplot2 developers, we might decide one day that this factorization of code is too granular, and inline the code of <code>firstUpper()</code> into <code>camelize()</code>, allowing us to remove <code>firstUpper()</code> altogether.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># New version of camelize</span> <span class='nv'>camelize</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>first</span> <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/grep.html'>gsub</a></span><span class='o'>(</span><span class='s'>"_(.)"</span>, <span class='s'>"\\U\\1"</span>, <span class='nv'>x</span>, perl <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='kr'>if</span> <span class='o'>(</span><span class='nv'>first</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/paste.html'>paste0</a></span><span class='o'>(</span><span class='nf'>to_upper_ascii</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/substr.html'>substring</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='m'>1</span>, <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span>, <span class='nf'><a href='https://rdrr.io/r/base/substr.html'>substring</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>&#125;</span></code></pre> </div> <p>All of that would be perfectly fine for us to do. After all, we are not changing the public interface of ggplot2, we aren&rsquo;t even changing how <code>camelize()</code> works. But, in packages that have fetched <code>camelize()</code> at build time, the function would be unchanged, still calling <code>firstUpper()</code> which now no longer exists. As you might imagine, this can lead to some very hard to debug errors for you, your users, and the maintainer of the dependency.</p> <h4 id="use-asnamespace-inside-a-function">Use <code>asNamespace()</code> inside a function <a href="#use-asnamespace-inside-a-function"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>This rule can be extended beyond its use to access unexported functions: <strong>Never assign a function from another package to a variable in your own package</strong>. You might import a function from a package developed by someone who prefers long and descriptive function names, say <code>add_these_two_objects_together()</code>, and find it easier to create a shorthand version:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>add2</span> <span class='o'>&lt;-</span> <span class='nv'>add_these_two_objects_together</span></code></pre> </div> <p>While <code>add_these_two_objects_together</code> is exported and you are doing nothing wrong in terms of interfaces, you are still setting up a build-time dependency that might cause breakage any time your dependency gets updated on a system.</p> <p>Thus, we arrive at the last approach: Fetching the function inside a function call and then using it. In the example, below we are using <a href="https://rdrr.io/r/base/ns-internal.html" target="_blank" rel="noopener"><code>asNamespace()</code></a> but the same principle holds true for <a href="https://rdrr.io/r/utils/getFromNamespace.html" target="_blank" rel="noopener"><code>getFromNamespace()</code></a></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>camelize</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://rdrr.io/r/base/ns-internal.html'>asNamespace</a></span><span class='o'>(</span><span class='s'>"ggplot2"</span><span class='o'>)</span><span class='o'>$</span><span class='nf'>camelize</span><span class='o'>(</span><span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#125;</span></code></pre> </div> <p>Now, while this is many times better than what we did before, it is still a big red flag. Consider the same situation as before. We inline every use of <code>camelize()</code> in ggplot2 (it&rsquo;s only used once), and remove the function. This will again lead to a breakage of your package when ggplot2 got updated because you made assumptions that ggplot2 never promised anything about.</p> <h4 id="what-to-do">What to do? <a href="#what-to-do"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>What if you really wanted that functionality? A good first approach is to simply copy the code for the function into your own package. For something like <code>camelize()</code>, this is fairly simple as it doesn&rsquo;t call into other internal functions (except <code>firstUpper()</code> but we saw that that could be inlined). One thing to keep in mind here is to make sure that the licence of the dependency doesn&rsquo;t prevent you from doing this (e.g. a package released under MIT license can&rsquo;t copy code from a package released under a GPL-2 licence).</p> <p>If you can&rsquo;t copy the code into your own package, either due to incompatible licenses or because the function is a rabbit hole of internal function calls, you&rsquo;ll need to reach out to the maintainer and ask whether the required function can be exported so you can use it. Keep in mind that there are many good reasons why you could get a &ldquo;no&rdquo;, since every new export increases the maintenance burden of a package. So, you can get a &ldquo;yes&rdquo; and all is well, or you might get a &ldquo;no&rdquo; and have to accept that as well. Getting a &ldquo;no&rdquo; is not a blanket approval to do any of the above things we have discussed, for the exact reasons we described. Rather, it means you have to reframe your solution so it doesn&rsquo;t require this functionality or abandon it altogether.</p> <h3 id="exported-structures">Exported structures <a href="#exported-structures"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>While the situation with functions is quite clear-cut &mdash; there are <em>do&rsquo;s</em> and <em>don&rsquo;ts</em> &mdash; we enter a much grayer area when it comes to any sort of data/object structure you get from a dependency, either as an object exported by the package or as a return value from an exported function. The reason why it is a gray area is that there is no formal way to specify an interface to on object in R and the users are used to an &ldquo;anything goes&rdquo; mentality when it comes to reaching into data structures. For example, while attributes are a bit more &ldquo;hidden away&rdquo; than elements in a list, there is no notion of these being prohibited from access. There might be a mutual understanding that, if you alter attributes in some way, it might lead to breakage somewhere downstream. But merely reading attributes is a pretty common thing to do. The same goes for more complex objects that contain more than just data. An example is the object created by a call to <code>ggplot()</code></p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nf'>ggplot2</span><span class='nf'>::</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; List of 9</span> <span class='c'>#&gt; $ data : list()</span> <span class='c'>#&gt; ..- attr(*, "class")= chr "waiver"</span> <span class='c'>#&gt; $ layers : list()</span> <span class='c'>#&gt; $ scales :Classes 'ScalesList', 'ggproto', 'gg' &lt;ggproto object: Class ScalesList, gg&gt;</span> <span class='c'>#&gt; add: function</span> <span class='c'>#&gt; clone: function</span> <span class='c'>#&gt; find: function</span> <span class='c'>#&gt; get_scales: function</span> <span class='c'>#&gt; has_scale: function</span> <span class='c'>#&gt; input: function</span> <span class='c'>#&gt; n: function</span> <span class='c'>#&gt; non_position_scales: function</span> <span class='c'>#&gt; scales: NULL</span> <span class='c'>#&gt; super: &lt;ggproto object: Class ScalesList, gg&gt; </span> <span class='c'>#&gt; $ mapping : Named list()</span> <span class='c'>#&gt; ..- attr(*, "class")= chr "uneval"</span> <span class='c'>#&gt; $ theme : list()</span> <span class='c'>#&gt; $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto', 'gg' &lt;ggproto object: Class CoordCartesian, Coord, gg&gt;</span> <span class='c'>#&gt; aspect: function</span> <span class='c'>#&gt; backtransform_range: function</span> <span class='c'>#&gt; clip: on</span> <span class='c'>#&gt; default: TRUE</span> <span class='c'>#&gt; distance: function</span> <span class='c'>#&gt; expand: TRUE</span> <span class='c'>#&gt; is_free: function</span> <span class='c'>#&gt; is_linear: function</span> <span class='c'>#&gt; labels: function</span> <span class='c'>#&gt; limits: list</span> <span class='c'>#&gt; modify_scales: function</span> <span class='c'>#&gt; range: function</span> <span class='c'>#&gt; render_axis_h: function</span> <span class='c'>#&gt; render_axis_v: function</span> <span class='c'>#&gt; render_bg: function</span> <span class='c'>#&gt; render_fg: function</span> <span class='c'>#&gt; setup_data: function</span> <span class='c'>#&gt; setup_layout: function</span> <span class='c'>#&gt; setup_panel_guides: function</span> <span class='c'>#&gt; setup_panel_params: function</span> <span class='c'>#&gt; setup_params: function</span> <span class='c'>#&gt; train_panel_guides: function</span> <span class='c'>#&gt; transform: function</span> <span class='c'>#&gt; super: &lt;ggproto object: Class CoordCartesian, Coord, gg&gt; </span> <span class='c'>#&gt; $ facet :Classes 'FacetNull', 'Facet', 'ggproto', 'gg' &lt;ggproto object: Class FacetNull, Facet, gg&gt;</span> <span class='c'>#&gt; compute_layout: function</span> <span class='c'>#&gt; draw_back: function</span> <span class='c'>#&gt; draw_front: function</span> <span class='c'>#&gt; draw_labels: function</span> <span class='c'>#&gt; draw_panels: function</span> <span class='c'>#&gt; finish_data: function</span> <span class='c'>#&gt; init_scales: function</span> <span class='c'>#&gt; map_data: function</span> <span class='c'>#&gt; params: list</span> <span class='c'>#&gt; setup_data: function</span> <span class='c'>#&gt; setup_params: function</span> <span class='c'>#&gt; shrink: TRUE</span> <span class='c'>#&gt; train_scales: function</span> <span class='c'>#&gt; vars: function</span> <span class='c'>#&gt; super: &lt;ggproto object: Class FacetNull, Facet, gg&gt; </span> <span class='c'>#&gt; $ plot_env :&lt;environment: R_GlobalEnv&gt; </span> <span class='c'>#&gt; $ labels : Named list()</span> <span class='c'>#&gt; - attr(*, "class")= chr [1:2] "gg" "ggplot"</span></code></pre> </div> <p>This is obviously more than just data, but which elements, if any, are actually fair to access as a package developer? This is a tough question to answer in general terms.</p> <p>If you want to be a very polite (and who wouldn&rsquo;t), the best way to go about it is to look for accessor functions for the part of the object you are interested in, and in the absence of one, ask the maintainer to add one. The reason why accessor functions are so much better than relying on e.g.  <a href="https://rdrr.io/r/base/attr.html" target="_blank" rel="noopener"><code>attr()</code></a> to extract some information stored in an attribute, is that it frees the maintainer to change the <em>structure</em> of the data/object, while keeping the <em>interface</em> constant. Asking a maintainer for a public accessor function will also alert the maintainer to the fact that others are actually interested in said information, which could inform future development.</p> <h4 id="testing-testing">Testing, testing <a href="#testing-testing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h4><p>You may be the most polite package developer, using only the finest public accessor functions in your code and keeping out of any data structure you don&rsquo;t control the provenance of and still be reliant of implementation details in objects from other packages. How? You may inadvertently test for their internal details in your unit tests when you are comparing objects wholesale, or if you have saved complex objects and load these up during testing.</p> <p>Once again, we are certainly in a gray area here, but one guideline to help you is to ask yourself whether your unit test is only testing for parts that your own package influence, or does it also include assumptions about implementation details of another package. As an example (once again from ggplot2), you might want to ensure that a plot function in your package works as intended. On one extreme end you can save a working ggplot object returned from your function and then test for equivalence with that during unit testing. This is not a great idea because anything we might change internally in ggplot2 would likely result in changes to the created ggplot2 object. And while it still works, it may look slightly different. On the other end, you may instead do visual testing using the vdiffr package where you only look at the actual output. However, that also makes a lot of assumptions about how ggplot2 chooses to render its objects and internal changes may again break your tests without there being anything broken in reality.</p> <blockquote> <p>Visual testing in general is something that is mainly intended for packages providing graphic rendering, e.g. ggplot2 and it&rsquo;s extension package ecosystem. If you are using other packages to create your plots you should in general lean on them to test for visual regressions.</p> </blockquote> <p>The Goldilocks zone for your testing is to figure out which exact elements your high-level plot function influences, and then get to these, preferably using public accessor functions. For ggplot2 it will often be enough to extract the data for each layer (using <code>layer_data()</code>) and test specific columns of that (never test against the full layer data since ggplot2 may add to this etc.).</p> <p>If you find that you are missing public accessor function in order to do proper testing, once again reach out to the maintainer and ask. You may learn that this information is not exposed because it is subject to change, thus a poor fit for unit testing. Or you may get your function and end up with more robust tests in your own package.</p> <p>While the example above is using ggplot2, this can be extrapolated to every other dependency that provide any form of complex output or exported data structure. Always question yourself whether your unit test is testing more than your own package&rsquo;s behavior. If they do, try to eliminate the influence of the dependencies as much as possible. Remember that tests that fail for reasons other than what it is testing for is not only annoying to you &mdash; it can also drag out the release of the packages you rely on.</p> It's about time https://www.tidyverse.org/blog/2022/09/its-about-time/ Wed, 28 Sep 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/09/its-about-time/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>At rstudio::conf(2022), Davis Vaughan gave a lightning talk on <a href="https://clock.r-lib.org/" target="_blank" rel="noopener">clock</a>, an R package that aims to provide comprehensive and safe handling of date-times. clock goes beyond the date and date-time types that base R provides, implementing new types for year-month, year-quarter, ISO year-week, and many other date-like formats, all with up to nanosecond precision. clock is <strong>not</strong> replacing <a href="https://lubridate.tidyverse.org/" target="_blank" rel="noopener">lubridate</a>. lubridate will never go away, and is not being deprecated or superseded. In the future, we expect to update lubridate to use clock as a backend (so clock becomes <a href="https://stringi.gagolewski.com/" target="_blank" rel="noopener">stringi</a> to lubridate&rsquo;s <a href="https://stringr.tidyverse.org/" target="_blank" rel="noopener">stringr</a>).</p> <p>In Davis&rsquo; talk, you&rsquo;ll see how clock emphasizes &ldquo;safety first&rdquo; when manipulating date-times, and how its new date-time types can be used in your own work.</p> <script src="https://fast.wistia.com/embed/medias/pzuyostdz8.jsonp" async></script> <script src="https://fast.wistia.com/assets/external/E-v1.js" async></script> <div class="wistia_responsive_padding" style="padding:56.25% 0 0 0;position:relative;"> <div class="wistia_responsive_wrapper" style="height:100%;left:0;position:absolute;top:0;width:100%;"> <div class="wistia_embed wistia_async_pzuyostdz8 videoFoam=true" style="height:100%;position:relative;width:100%"> <div class="wistia_swatch" style="height:100%;left:0;opacity:0;overflow:hidden;position:absolute;top:0;transition:opacity 200ms;width:100%;"> <p><img src="https://fast.wistia.com/embed/medias/pzuyostdz8/swatch" style="filter:blur(5px);height:100%;object-fit:contain;width:100%;" alt="" aria-hidden="true" onload="this.parentNode.style.opacity=1;" /></p> </div> </div> </div> </div> <details> <summary> <strong>Transcript</strong> </summary> <p>I am here to talk about time, which is obviously everyone&rsquo;s favorite subject. In particular, I&rsquo;m actually here to talk about a package called clock.</p> <p>So, clock is a date time manipulation library kind of in the same way that lubridate is a date time manipulation library. It does things you might expect add dates, subtract dates, format and parse them. All kinds of other manipulation. If you get anything out of this talk, it&rsquo;s really that clock is not here to replace lubridate in any way. The only idea would be that in the end clock might be a back end for lubridate in the same way that dtplyr or dbplyr are different types of back ends for dplyr. And I&rsquo;m not even going to spend the rest of this talk talking about features that overlap with lubridate.</p> <p>Instead, I want to talk about things that are pretty unique to clock. One of those is safety. And one of those is calendars.</p> <p>Because I only have 5 minutes, I&rsquo;m going to do that with one date, January 30th of this year. Safety is built into clock from the ground up to hopefully avoid issues like this, time zone issues, invalid date issues, things that are pretty common when you&rsquo;re working with time series and just drive you up the wall.</p> <p>So let&rsquo;s jump into safety. Here&rsquo;s a timeline. This is January 30th, our date in question marked in blue on our timeline. It continues through to February. On the next line, you&rsquo;ll see this gap between February and March because February only has 28 days, but January had 31, so it doesn&rsquo;t necessarily map 1 to 1. If I were to ask you this seemingly innocuous question. Please add one month to this date. What would you get?</p> <p>Well, if we were to ask lubridate, it gives you a somewhat reasonable answer of NA. There is nothing that maps 1 to 1 from January 30th to something in February, maybe. And there&rsquo;s nothing particularly wrong with this except for the fact that it&rsquo;s not the most useful answer. Generally, you&rsquo;ll be running this code and it happens silently. And then five steps downstream. All of a sudden, you discover there&rsquo;s some NAs here. Like, I didn&rsquo;t have those to begin with. Where did those come from? And you have to backtrack up through your calculations and figure out why they appeared.</p> <p>If you were to ask clock this question with add months, it actually gives you an error in this special case by default. It says, whoa, hold up. There&rsquo;s something wrong here. Go look at location 1. If you had a vector, it might be location five, seven, whatever. And check out the invalid argument to learn more about this case. You go and you look at the documentation and you come out with the idea that maybe I could set this thing called invalid equals previous. That allows you to say, give me the previous valid date when I have this kind of problem. That&rsquo;s the end of February. I think that&rsquo;s a pretty reasonable result in this case. But you also might want to say, depending on your specific problem, invalid equals next to map forward to the beginning of March instead. If you actually do like that lubridate behavior, that&rsquo;s fine. You can say invalid equals NA any time that occurs, you&rsquo;ll get an NA instead. So that&rsquo;s about safety.</p> <p>Let&rsquo;s talk about calendars. Calendars are just the idea of a way to represent a unique point in time. With our date in question, we could use a calendar called year month day to represent this date using three components the year, the month, and the day of the month. But this isn&rsquo;t the only way you could represent this date. You could also use the year and the day of the year, or you could use one of these many other calendar types that are built into clock.</p> <p>If your finance person, you might be particularly interested in year quarter day, which uses a true fiscal year to represent your date. These are really nice because they&rsquo;re all convertible to each other. You can work with any particular calendar type and say you need to get the quarter out. You convert to year quarter day, you do manipulation over there, you convert back. It&rsquo;s obviously convertible with the date in POSIXct as well, since those are the date time types that you&rsquo;re most likely to start out with.</p> <p>The other really neat thing that I find really fun about these calendar types is that they have what&rsquo;s known as variable precision. These are all day precision calendar types at this point, but we could narrow that down to month precision as needed. And you&rsquo;ve got a built-in year month type in clock. Similarly, you could have a built-in year quarter type. You can actually go the other way, too. You can widen it out all the way to nanoseconds if you need it.</p> <p>The last thing I&rsquo;ll say is that clock is completely compatible with some of the other packages you might be familiar with that I&rsquo;ve created called slider and IVs. Slider is one for rolling averages, so you can use clock types as the index to say, give me a rolling average. looking back four or five quarters IVs is a relatively new package. You might not have heard of this one yet, but it deals with date ranges and you can use clock types as the components of those ranges.</p> <p>So to sum up, lubridate is not going anywhere. Don&rsquo;t worry, but please try clock for enhanced safety in these powerful new types. Thank you.</p> </details> <h2 id="try-clock">Try clock <a href="#try-clock"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>To try clock out, you can install the released version from <a href="https://cran.r-project.org/" target="_blank" rel="noopener">CRAN</a> with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"clock"</span><span class='o'>)</span></span></code></pre> </div> <p>Or, install the development version from its <a href="https://github.com/r-lib/clock" target="_blank" rel="noopener">GitHub repo</a> with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span><span class='c'># install.packages("remotes")</span></span> <span><span class='nf'>remotes</span><span class='nf'>::</span><span class='nf'><a href='https://remotes.r-lib.org/reference/install_github.html'>install_github</a></span><span class='o'>(</span><span class='s'>"r-lib/clock"</span><span class='o'>)</span></span></code></pre> </div> <h2 id="learn-more">Learn more <a href="#learn-more"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You can learn more about clock by reading Davis&rsquo; blog post announcing its first release, <a href="https://www.tidyverse.org/blog/2021/03/clock-0-1-0/" target="_blank" rel="noopener">Comprehensive date-time handling for R</a>. Also be sure to check out its vignettes:</p> <ul> <li> <p> <a href="https://clock.r-lib.org/articles/clock.html" target="_blank" rel="noopener">Getting started</a></p> </li> <li> <p> <a href="https://clock.r-lib.org/articles/articles/motivations.html" target="_blank" rel="noopener">Motivations for clock</a></p> </li> <li> <p> <a href="https://clock.r-lib.org/articles/recipes.html" target="_blank" rel="noopener">Examples and recipes</a></p> </li> <li> <p> <a href="https://clock.r-lib.org/articles/faq.html" target="_blank" rel="noopener">Frequently asked questions</a></p> </li> </ul> brulee 0.2.0 https://www.tidyverse.org/blog/2022/09/brulee-0-2-0/ Mon, 26 Sep 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/09/brulee-0-2-0/ <p>We&rsquo;re thrilled to announce the release of <a href="https://tidymodels.github.io/brulee/" target="_blank" rel="noopener">brulee</a> 0.2.0. brulee contains several basic modeling functions that use the torch package infrastructure, such as: neural networks, linear regression, logistic regression, and multinomial regression.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;brulee&#34;</span><span class="p">)</span> </code></pre></div><p>This blog post will describe the changes to the package. You can see a full list of changes in the <a href="https://tidymodels.github.io/brulee/news/index.html" target="_blank" rel="noopener">release notes</a>.</p> <p>There were two main additions to brulee.</p> <p>First, since brulee is focused on fitting models to <em>tabular data</em>, we have moved away from optimizing via stochastic gradient descent (SGD) as the default. For <code>brulee_mlp()</code>, we switched the default optimizer from SGD to more traditional quasi-newton methods, specifically to Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) method. You can still use SGD via the <code>optimizer</code> option.</p> <p>Second, we&rsquo;ve added <a href="https://www.google.com/search?rls=en&amp;q=%22learning&#43;rate&#43;schedule%22" target="_blank" rel="noopener">learning rate schedulers</a> to <code>brulee_mlp()</code>. The learning rate is one of the most important parameters to tune. There is an existing option to have a constant learning rate (via the <code>learn_rate</code> argument). However, there is some intuition that the rate should probably decrease once the optimizer is closer to the best solution (to avoid overshooting the target). A scheduler is a function that adjusts the rate over time. Apart from a constant learning rate (the default), the options are cyclic, exponential decay, time-based decay, and step functions:</p> <p><img src="rates.png" title="plot of chunk unnamed-chunk-2" alt="plot of chunk unnamed-chunk-2" style="display: block; margin: auto;" /></p> <p>The corresponding <a href="https://tidymodels.github.io/brulee/reference/schedule_decay_time.html" target="_blank" rel="noopener">set of functions</a> share the prefix <code>schedule_*()</code>.</p> <p>To use these with <code>brulee_mlp()</code>, there is a <code>rate_schedule</code> argument with possible values: <code>&quot;none&quot;</code> (the default), <code>&quot;decay_time&quot;</code>, <code>&quot;decay_expo&quot;</code>, <code>&quot;cyclic&quot;</code> and <code>&quot;step&quot;</code>. Each function has arguments and these can be passed directly to <code>brulee_mlp()</code>. The <code>rate_schedule</code> argument can also be tuned as any other engine-specific parameter.</p> <h2 id="an-example">An example <a href="#an-example"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Let&rsquo;s look at an example using the Ames housing data. We&rsquo;ll use tidymodels to split the data and also preprocess the data a bit.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">brulee</span><span class="p">)</span> <span class="c1"># ------------------------------------------------------------------------------</span> <span class="nf">tidymodels_prefer</span><span class="p">()</span> <span class="nf">theme_set</span><span class="p">(</span><span class="nf">theme_bw</span><span class="p">())</span> <span class="c1"># ------------------------------------------------------------------------------</span> <span class="nf">data</span><span class="p">(</span><span class="n">ames</span><span class="p">,</span> <span class="n">package</span> <span class="o">=</span> <span class="s">&#34;modeldata&#34;</span><span class="p">)</span> <span class="n">ames</span><span class="o">$</span><span class="n">Sale_Price</span> <span class="o">&lt;-</span> <span class="nf">log10</span><span class="p">(</span><span class="n">ames</span><span class="o">$</span><span class="n">Sale_Price</span><span class="p">)</span> <span class="c1"># ------------------------------------------------------------------------------</span> <span class="nf">set.seed</span><span class="p">(</span><span class="m">5685</span><span class="p">)</span> <span class="n">split</span> <span class="o">&lt;-</span> <span class="nf">initial_split</span><span class="p">(</span><span class="n">ames</span><span class="p">)</span> <span class="n">ames_train</span> <span class="o">&lt;-</span> <span class="nf">training</span><span class="p">(</span><span class="n">split</span><span class="p">)</span> <span class="n">ames_test</span> <span class="o">&lt;-</span> <span class="nf">testing</span><span class="p">(</span><span class="n">split</span><span class="p">)</span> <span class="c1"># ------------------------------------------------------------------------------</span> <span class="c1"># Let&#39;s make a recipe to preprocess the data</span> <span class="n">ames_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">Sale_Price</span> <span class="o">~</span> <span class="n">Bldg_Type</span> <span class="o">+</span> <span class="n">Neighborhood</span> <span class="o">+</span> <span class="n">Year_Built</span> <span class="o">+</span> <span class="n">Gr_Liv_Area</span> <span class="o">+</span> <span class="n">Full_Bath</span> <span class="o">+</span> <span class="n">Year_Sold</span> <span class="o">+</span> <span class="n">Lot_Area</span> <span class="o">+</span> <span class="n">Central_Air</span> <span class="o">+</span> <span class="n">Longitude</span> <span class="o">+</span> <span class="n">Latitude</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ames_train</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="c1"># Transform some highly skewed predictors</span> <span class="nf">step_BoxCox</span><span class="p">(</span><span class="n">Lot_Area</span><span class="p">,</span> <span class="n">Gr_Liv_Area</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="c1"># Lump some rarely occurring categories into &#34;other&#34;</span> <span class="nf">step_other</span><span class="p">(</span><span class="n">Neighborhood</span><span class="p">,</span> <span class="n">threshold</span> <span class="o">=</span> <span class="m">0.05</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="c1"># Encode categorical predictors as binary.</span> <span class="nf">step_dummy</span><span class="p">(</span><span class="nf">all_nominal_predictors</span><span class="p">(),</span> <span class="n">one_hot</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="c1"># Add an interaction effect:</span> <span class="nf">step_interact</span><span class="p">(</span><span class="o">~</span> <span class="nf">starts_with</span><span class="p">(</span><span class="s">&#34;Central_Air&#34;</span><span class="p">)</span><span class="o">:</span><span class="n">Year_Built</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_zv</span><span class="p">(</span><span class="nf">all_predictors</span><span class="p">())</span> <span class="o">%&gt;%</span> <span class="nf">step_normalize</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span> </code></pre></div><p>Now we can fit the model by passing the data, the recipe, and other options to <code>brulee_mlp()</code>. We&rsquo;ll use a cyclic scheduler with a half-cycle size of 5 epochs:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">set.seed</span><span class="p">(</span><span class="m">827</span><span class="p">)</span> <span class="n">fit</span> <span class="o">&lt;-</span> <span class="nf">brulee_mlp</span><span class="p">(</span><span class="n">ames_rec</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ames_train</span><span class="p">,</span> <span class="n">hidden_units</span> <span class="o">=</span> <span class="m">20</span><span class="p">,</span> <span class="n">epochs</span> <span class="o">=</span> <span class="m">151</span><span class="p">,</span> <span class="n">penalty</span> <span class="o">=</span> <span class="m">0.05</span><span class="p">,</span> <span class="n">rate_schedule</span> <span class="o">=</span> <span class="s">&#34;cyclic&#34;</span><span class="p">,</span> <span class="n">step_size</span> <span class="o">=</span> <span class="m">5</span><span class="p">)</span> <span class="c1"># Show the validation loss and alter the x-axis tick marks to correspond to cycles. </span> <span class="n">cycles</span> <span class="o">&lt;-</span> <span class="nf">seq</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">151</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> <span class="nf">autoplot</span><span class="p">(</span><span class="n">fit</span><span class="p">)</span> <span class="o">+</span> <span class="nf">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span> <span class="o">=</span> <span class="n">cycles</span><span class="p">,</span> <span class="n">minor_breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> </code></pre></div><p><img src="figure/val-loss-1.svg" title="plot of chunk val-loss" alt="plot of chunk val-loss" width="90%" /></p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/sametsoekel" target="_blank" rel="noopener">@sametsoekel</a>, and <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a> for their help since the previous release.</p> Come work with us in Developer Relations https://www.tidyverse.org/blog/2022/09/devrel-hiring-2022/ Thu, 22 Sep 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/09/devrel-hiring-2022/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re hiring!</p> <p>A key part of what we do in open source at RStudio is help users learn and flourish by developing and disseminating documentation and fostering a welcoming community. To grow our capacity we&rsquo;re hiring for a couple of roles in developer relations:</p> <ul> <li> <p> <a href="https://www.rstudio.com/about/job-posting/?gh_jid=5295125003" target="_blank" rel="noopener">Developer Educator - tidyverse package development</a> <br> This role will teach package development, contribute to documentation, articulate user needs, and engage in package development activities, particularly on packages like roxygen2, testthat, devtools, and usethis that form a fundamental part of our package development workflow.</p> </li> <li> <p> <a href="https://www.rstudio.com/about/job-posting/?gh_jid=5314591003" target="_blank" rel="noopener">Developer Advocate - Quarto</a> <br> This role will focus on connecting with the community of Quarto users and developers to share knowledge and updates, listen to community needs and interests, be responsive and foster community around Quarto in R, Python and other languages. As a data scientist yourself, you will connect with the data science community, understand pain points, and identify and help create relevant documentation resources. We also expect that you&rsquo;ll become an expert in Quarto and contribute meaningfully through GitHub issues, discussions, and PRs, and help articulate user needs and work with other developers to implement these needs into bigger features.</p> </li> </ul> <p>In both of these roles, we&rsquo;ve highlighted the &lsquo;developer&rsquo; aspect of the role. We recognize that it&rsquo;s very hard to teach or share about something well if you don&rsquo;t have the opportunity to do it with some regularity. The developer educator and developer advocate roles are therefore dual purpose - you will be both a developer and an educator or advocate.</p> <p>For these roles you&rsquo;ll know how to use content creation tools to produce pedagogically appropriate and accessible resources to guide data scientists. We particularly encourage folks who are fluent in Spanish to apply. We want to help better support the Latin American R community to continue to build out documentation and learning resources.</p> <p>Please see the postings for more information about these roles, and share broadly! International applications are encouraged. We will begin reviewing applications on <strong>September 28th</strong> and the positions will remain open until filled. We&rsquo;re excited about this opportunity to expand our education and outreach efforts and look forward to continuing to work together with the community.</p> Announcing bundle https://www.tidyverse.org/blog/2022/09/bundle-0-1-0/ Fri, 16 Sep 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/09/bundle-0-1-0/ <p>We&rsquo;re thrilled to announce the first release of <a href="https://rstudio.github.io/bundle/" target="_blank" rel="noopener">bundle</a>. The bundle package provides a consistent interface to capture all information needed to serialize a model, situate that information within a portable object, and restore it for use in new settings.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;bundle&#34;</span><span class="p">)</span> </code></pre></div><p>Let&rsquo;s walk through what bundle does, and when you might need to use it.</p> <h2 id="saving-things-is-hard">Saving things is hard <a href="#saving-things-is-hard"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We often think of a trained model as a self-contained R object. The model exists in memory in R and if we have some new data, the model object can generate predictions on its own:</p> <p><img src="diagram_01.png" alt="A diagram showing a rectangle, labeled model object, and another rectangle, labeled predictions. The two are connected by an arrow from model object to predictions, with the label predict." width="100%" /></p> <p>In reality, model objects sometimes also make use of <em>references</em> to generate predictions. A reference is a piece of information that a model object refers to that isn&rsquo;t part of the object itself; this could be something like a connection to a server, a file on disk, or an internal function in the package used to train the model. When we call <code>predict()</code>, model objects know where to look to retrieve that information:</p> <p><img src="diagram_02.png" alt="A diagram showing the same pair of rectangles as before, connected by the arrow labeled predict. This time, though, we introduce two boxes labeled reference. These two boxes are connected to the arrow labeled predict with dotted arrows, to show that, most of the time, we don't need to think about including them in our workflow." width="100%" /></p> <p>Saving model objects can sometimes disrupt those references. Thus, if we want to train a model, save it, re-load it in a production setting, and generate predictions with it, we may run into issues:</p> <p><img src="diagram_03.png" alt="A diagram showing the same set of rectangles, representing a prediction problem, as before. This version of the diagram adds two boxes, labeled R Session numbe r one, and R session number two. In R session number two, we have a new rectangle labeled standalone model object. In focus is the arrow from the model object, in R Session number one, to the standalone model object in R session number two." width="100%" /></p> <p>We need some way to preserve access to those references. This new package provides a consistent interface for <em>bundling</em> model objects with their references so that they can be safely saved and re-loaded in production:</p> <p><img src="diagram_04.png" alt="A replica of the previous diagram, where the arrow previously connecting the model object in R session one and the standalone model object in R session two is connected by a verb called bundle. The bundle function outputs an object called a bundle." width="100%" /></p> <h2 id="when-to-bundle-your-model">When to bundle your model <a href="#when-to-bundle-your-model"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Let&rsquo;s walk through building a couple of models using data on <a href="https://modeldata.tidymodels.org/reference/cells.html" target="_blank" rel="noopener">cell body segmentation</a>.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="nf">data</span><span class="p">(</span><span class="n">cells</span><span class="p">,</span> <span class="n">package</span> <span class="o">=</span> <span class="s">&#34;modeldata&#34;</span><span class="p">)</span> <span class="nf">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span> <span class="n">cell_split</span> <span class="o">&lt;-</span> <span class="n">cells</span> <span class="o">%&gt;%</span> <span class="nf">select</span><span class="p">(</span><span class="o">-</span><span class="n">case</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">initial_split</span><span class="p">(</span><span class="n">strata</span> <span class="o">=</span> <span class="n">class</span><span class="p">)</span> <span class="n">cell_train</span> <span class="o">&lt;-</span> <span class="nf">training</span><span class="p">(</span><span class="n">cell_split</span><span class="p">)</span> <span class="n">cell_test</span> <span class="o">&lt;-</span> <span class="nf">testing</span><span class="p">(</span><span class="n">cell_split</span><span class="p">)</span> </code></pre></div><p>First, let&rsquo;s train a logistic regression model:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">glm_fit</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span><span class="n">class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">family</span> <span class="o">=</span> <span class="s">&#34;binomial&#34;</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">cell_train</span><span class="p">)</span> </code></pre></div><p>If we&rsquo;re satisfied with this model and think it is ready for production, we might want to deploy it somewhere, maybe as a REST API or as a Shiny app. A typical approach would be to:</p> <ul> <li>save our model object</li> <li>start up a new R session</li> <li>load the model object into the new session</li> <li>predict on new data with the loaded model object</li> </ul> <p>The <a href="https://callr.r-lib.org/" target="_blank" rel="noopener">callr</a> package is helpful for demonstrating this kind of situation; it allows us to start up a fresh R session and pass a few objects in.</p> <p>We&rsquo;ll just make use of two of the arguments to the function <code>r()</code>:</p> <ul> <li><code>func</code>: A function that, given a model object and some new data, will generate predictions, and</li> <li><code>args</code>: A named list, giving the arguments to the above function.</li> </ul> <p>Let&rsquo;s save our model object to a temporary file and pass it to a fresh R session for prediction, like if we had deployed the model somewhere.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">callr</span><span class="p">)</span> <span class="n">temp_file</span> <span class="o">&lt;-</span> <span class="nf">tempfile</span><span class="p">()</span> <span class="nf">saveRDS</span><span class="p">(</span><span class="n">glm_fit</span><span class="p">,</span> <span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> <span class="nf">r</span><span class="p">(</span> <span class="nf">function</span><span class="p">(</span><span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">{</span> <span class="n">model_object</span> <span class="o">&lt;-</span> <span class="nf">readRDS</span><span class="p">(</span><span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> <span class="nf">predict</span><span class="p">(</span><span class="n">model_object</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">},</span> <span class="n">args</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span> <span class="n">temp_file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span> <span class="o">=</span> <span class="nf">head</span><span class="p">(</span><span class="n">cell_test</span><span class="p">)</span> <span class="p">)</span> <span class="p">)</span> </code></pre></div><pre><code>## 1 2 3 4 5 6 ## -4.8706401 -1.8143956 2.3386470 -1.2735249 -0.3586448 2.7865270 </code></pre><p>Nice! 😀</p> <p>What if instead we wanted to train a neural network using tidymodels, with keras as the modeling engine?</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">cell_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">cell_train</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_YeoJohnson</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span> <span class="o">%&gt;%</span> <span class="nf">step_normalize</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span> <span class="n">keras_spec</span> <span class="o">&lt;-</span> <span class="nf">mlp</span><span class="p">(</span><span class="n">penalty</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span> <span class="n">epochs</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;classification&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;keras&#34;</span><span class="p">,</span> <span class="n">verbose</span> <span class="o">=</span> <span class="m">0</span><span class="p">)</span> <span class="n">keras_fit</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">(</span><span class="n">cell_rec</span><span class="p">,</span> <span class="n">keras_spec</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">fit</span><span class="p">(</span><span class="n">data</span> <span class="o">=</span> <span class="n">cell_train</span><span class="p">)</span> </code></pre></div><p>Let&rsquo;s try to save this to disk and then reload it in a new session.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">temp_file</span> <span class="o">&lt;-</span> <span class="nf">tempfile</span><span class="p">()</span> <span class="nf">saveRDS</span><span class="p">(</span><span class="n">keras_fit</span><span class="p">,</span> <span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> <span class="nf">r</span><span class="p">(</span> <span class="nf">function</span><span class="p">(</span><span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">{</span> <span class="nf">library</span><span class="p">(</span><span class="n">workflows</span><span class="p">)</span> <span class="n">model_object</span> <span class="o">&lt;-</span> <span class="nf">readRDS</span><span class="p">(</span><span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> <span class="nf">predict</span><span class="p">(</span><span class="n">model_object</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">},</span> <span class="n">args</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span> <span class="n">temp_file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span> <span class="o">=</span> <span class="nf">head</span><span class="p">(</span><span class="n">cell_test</span><span class="p">)</span> <span class="p">)</span> <span class="p">)</span> </code></pre></div><pre><code>## Error: ! error in callr subprocess ## Caused by error in `do.call(object$predict, args)`: ## ! 'what' must be a function or character string </code></pre><p>Oh no! 😱</p> <p>It turns out that keras models <a href="https://tensorflow.rstudio.com/guides/keras/serialization_and_saving.html" target="_blank" rel="noopener">need to be saved in a special way</a>. This is true of a handful of models, like XGBoost, and even some preprocessing steps, like UMAP. These special ways to save objects, like the ones that keras provide, are often referred to as <em>native serialization</em>. Methods for native serialization know which references need to be brought along in order for an object to effectively do its thing in a new environment, but they are different for each model.</p> <p>The bundle package provides a consistent way to deal with all these kinds of special serialization. The package provides two functions, <code>bundle()</code> and <code>unbundle()</code>, that take care of all of the minutae of preparing to save and load R objects effectively. You <code>bundle()</code> your model before you save it:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">bundle</span><span class="p">)</span> <span class="n">temp_file</span> <span class="o">&lt;-</span> <span class="nf">tempfile</span><span class="p">()</span> <span class="n">keras_bundle</span> <span class="o">&lt;-</span> <span class="nf">bundle</span><span class="p">(</span><span class="n">keras_fit</span><span class="p">)</span> <span class="nf">saveRDS</span><span class="p">(</span><span class="n">keras_bundle</span><span class="p">,</span> <span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> </code></pre></div><p>And then you <code>unbundle()</code> after you read the object in a new session:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">r</span><span class="p">(</span> <span class="nf">function</span><span class="p">(</span><span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">{</span> <span class="nf">library</span><span class="p">(</span><span class="n">bundle</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">workflows</span><span class="p">)</span> <span class="n">model_bundle</span> <span class="o">&lt;-</span> <span class="nf">readRDS</span><span class="p">(</span><span class="n">file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">)</span> <span class="n">model_object</span> <span class="o">&lt;-</span> <span class="nf">unbundle</span><span class="p">(</span><span class="n">model_bundle</span><span class="p">)</span> <span class="nf">predict</span><span class="p">(</span><span class="n">model_object</span><span class="p">,</span> <span class="n">new_data</span><span class="p">)</span> <span class="p">},</span> <span class="n">args</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span> <span class="n">temp_file</span> <span class="o">=</span> <span class="n">temp_file</span><span class="p">,</span> <span class="n">new_data</span> <span class="o">=</span> <span class="nf">head</span><span class="p">(</span><span class="n">cell_test</span><span class="p">)</span> <span class="p">)</span> <span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 6 × 1 ## .pred_class ## &lt;fct&gt; ## 1 PS ## 2 PS ## 3 WS ## 4 PS ## 5 PS ## 6 WS </code></pre><p>Hooray! 🎉</p> <p>We have support in bundle for a <a href="https://rstudio.github.io/bundle/reference/" target="_blank" rel="noopener">wide variety</a> of models that require (or <em>sometimes</em> require) special handling for serialization, from <a href="https://h2o.ai/" target="_blank" rel="noopener">H2O</a> to <a href="https://mlverse.github.io/luz/" target="_blank" rel="noopener">torch luz models</a>. Soon bundle will be integrated into <a href="https://vetiver.rstudio.com/" target="_blank" rel="noopener">vetiver</a>, for better and more robust deployment options. If you use a model that needs special serialization and is not yet supported, <a href="https://github.com/rstudio/bundle/issues" target="_blank" rel="noopener">let us know</a> in an issue.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thank you so much to everyone who contributed to this first release: <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>. I would especially like to highlight Simon&rsquo;s contributions, which have been central to bundle getting off the ground!</p> Make your ggplot2 extension package understand the new linewidth aesthetic https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/ Wed, 24 Aug 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/08/ggplot2-3-4-0-size-to-linewidth/ <p>We are hard at work finishing the next release of ggplot2. While this release is mostly about internal changes, there are a few quite user visible changes as well. One of these upends the idea that the <code>size</code> aesthetic is responsible for <em>both</em> the sizing of point/text and the width of lines. With the next release we will have a <code>linewidth</code> aesthetic to take care of the latter, while <code>size</code> will continue handling the former.</p> <p>There are many excellent reasons for this change, all of which will have to wait until the release post to be discussed. This blog post is for those that maintain an extension package for ggplot2 and are left wondering how they should respond to this &mdash; if that is you, please read on!</p> <h2 id="the-way-it-works">The way it works <a href="#the-way-it-works"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Before going into technicalities we&rsquo;ll describe how it is intended to work. We are well aware that we can&rsquo;t just make a change that would instantly break everyone&rsquo;s code. So, we have gone to great length to make old code work as before while gently coercing users into adopting the new paradigm. For example, take a look at this old code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>Day</span>, y <span class='o'>=</span> <span class='nv'>Temp</span>, size <span class='o'>=</span> <span class='nv'>Wind</span>, group <span class='o'>=</span> <span class='nv'>Month</span><span class='o'>)</span>, lineend <span class='o'>=</span> <span class='s'>"round"</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #00BB00;'>size</span> aesthetic has been deprecated for use with lines as of ggplot2 3.4.0</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use <span style='color: #00BB00;'>linewidth</span> aesthetic instead</span> <span class='c'>#&gt; <span style='color: #555555;'>This message is displayed once every 8 hours.</span></span> </code></pre> <p><img src="figs/unnamed-chunk-1-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As you can see, ggplot2 detects the use of the <code>size</code> aesthetic and informs the user about the new <code>linewidth</code> aesthetic but otherwise proceeds as before, producing the expected plot. As expected, <a href="https://ggplot2.tidyverse.org/reference/scale_size.html" target="_blank" rel="noopener"><code>scale_size()</code></a> also works in this situation:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>Day</span>, y <span class='o'>=</span> <span class='nv'>Temp</span>, size <span class='o'>=</span> <span class='nv'>Wind</span>, group <span class='o'>=</span> <span class='nv'>Month</span><span class='o'>)</span>, lineend <span class='o'>=</span> <span class='s'>"round"</span> <span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_size.html'>scale_size</a></span><span class='o'>(</span><span class='s'>"Windspeed (mph)"</span>, range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #00BB00;'>size</span> aesthetic has been deprecated for use with lines as of ggplot2 3.4.0</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use <span style='color: #00BB00;'>linewidth</span> aesthetic instead</span> <span class='c'>#&gt; <span style='color: #555555;'>This message is displayed once every 8 hours.</span></span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>but ultimately we want users to migrate to the following code:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>airquality</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>Day</span>, y <span class='o'>=</span> <span class='nv'>Temp</span>, linewidth <span class='o'>=</span> <span class='nv'>Wind</span>, group <span class='o'>=</span> <span class='nv'>Month</span><span class='o'>)</span>, lineend <span class='o'>=</span> <span class='s'>"round"</span> <span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_linewidth.html'>scale_linewidth</a></span><span class='o'>(</span><span class='s'>"Windspeed (mph)"</span>, range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.5</span>, <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <blockquote> <p>Note that there&rsquo;s an important difference between these two plots (and one of the reasons we&rsquo;re making the change): The last two plots differ because the default <code>linewidth</code> scale correctly use a linear transform instead of a square root transform (which is only sensible for scaling of areas).</p> </blockquote> <h2 id="how-to-adopt-this">How to adopt this <a href="#how-to-adopt-this"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We have been able to add this automatic translation in a quite non-intrusive way which means that you as a package developer don&rsquo;t need to do that much to adapt to the new naming. To show this I&rsquo;ll create a geom drawing circles then update it to use linewidth instead:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>GeomCircle</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggproto.html'>ggproto</a></span><span class='o'>(</span><span class='s'>"GeomCircle"</span>, <span class='nv'>Geom</span>, draw_panel <span class='o'>=</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>data</span>, <span class='nv'>panel_params</span>, <span class='nv'>coord</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='c'># Expand x, y, radius data to points along circle</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/funprog.html'>Map</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, <span class='nv'>r</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nv'>radians</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>2</span><span class='o'>*</span><span class='nv'>pi</span>, length.out <span class='o'>=</span> <span class='m'>101</span><span class='o'>)</span><span class='o'>[</span><span class='o'>-</span><span class='m'>1</span><span class='o'>]</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>cos</a></span><span class='o'>(</span><span class='nv'>radians</span><span class='o'>)</span> <span class='o'>*</span> <span class='nv'>r</span> <span class='o'>+</span> <span class='nv'>x</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>sin</a></span><span class='o'>(</span><span class='nv'>radians</span><span class='o'>)</span> <span class='o'>*</span> <span class='nv'>r</span> <span class='o'>+</span> <span class='nv'>y</span> <span class='o'>)</span> <span class='o'>&#125;</span>, x <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>y</span>, r <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>radius</span><span class='o'>)</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/do.call.html'>do.call</a></span><span class='o'>(</span><span class='nv'>rbind</span>, <span class='nv'>circle_data</span><span class='o'>)</span> <span class='c'># Transform to viewport coords</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nv'>coord</span><span class='o'>$</span><span class='nf'>transform</span><span class='o'>(</span><span class='nv'>circle_data</span>, <span class='nv'>panel_params</span><span class='o'>)</span> <span class='c'># Draw as grob</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.polygon.html'>polygonGrob</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nv'>circle_data</span><span class='o'>$</span><span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>circle_data</span><span class='o'>$</span><span class='nv'>y</span>, id.lengths <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>data</span><span class='o'>)</span><span class='o'>)</span>, default.units <span class='o'>=</span> <span class='s'>"native"</span>, gp <span class='o'>=</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> col <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>colour</span>, fill <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>fill</span>, lwd <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>size</span> <span class='o'>*</span> <span class='nv'>.pt</span>, lty <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>linetype</span> <span class='o'>)</span> <span class='o'>)</span> <span class='o'>&#125;</span>, required_aes <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span>, <span class='s'>"radius"</span><span class='o'>)</span>, default_aes <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span> colour <span class='o'>=</span> <span class='s'>"black"</span>, fill <span class='o'>=</span> <span class='s'>"grey"</span>, size <span class='o'>=</span> <span class='m'>0.5</span>, linetype <span class='o'>=</span> <span class='m'>1</span>, alpha <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span>, draw_key <span class='o'>=</span> <span class='nv'>draw_key_polygon</span> <span class='o'>)</span> <span class='nv'>geom_circle</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>mapping</span> <span class='o'>=</span> <span class='kc'>NULL</span>, <span class='nv'>data</span> <span class='o'>=</span> <span class='kc'>NULL</span>, <span class='nv'>stat</span> <span class='o'>=</span> <span class='s'>"identity"</span>, <span class='nv'>position</span> <span class='o'>=</span> <span class='s'>"identity"</span>, <span class='nv'>...</span>, <span class='nv'>na.rm</span> <span class='o'>=</span> <span class='kc'>FALSE</span>, <span class='nv'>show.legend</span> <span class='o'>=</span> <span class='kc'>NA</span>, <span class='nv'>inherit.aes</span> <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/layer.html'>layer</a></span><span class='o'>(</span> data <span class='o'>=</span> <span class='nv'>data</span>, mapping <span class='o'>=</span> <span class='nv'>mapping</span>, stat <span class='o'>=</span> <span class='nv'>stat</span>, geom <span class='o'>=</span> <span class='nv'>GeomCircle</span>, position <span class='o'>=</span> <span class='nv'>position</span>, show.legend <span class='o'>=</span> <span class='nv'>show.legend</span>, inherit.aes <span class='o'>=</span> <span class='nv'>inherit.aes</span>, params <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span> na.rm <span class='o'>=</span> <span class='nv'>na.rm</span>, <span class='nv'>...</span> <span class='o'>)</span> <span class='o'>)</span> <span class='o'>&#125;</span></code></pre> </div> <p>As a sanity check, let us check that this actually works:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>random_points</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>20</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>20</span><span class='o'>)</span>, radius <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>20</span>, max <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span>, value <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>20</span><span class='o'>)</span> <span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>random_points</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_circle</span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>y</span>, radius <span class='o'>=</span> <span class='nv'>radius</span>, size <span class='o'>=</span> <span class='nv'>value</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>It seems to work as intended. As can be seen from the code above, the <code>size</code> aesthetics is not used much and is passed directly into <code>polygonGrob()</code>. It follows that updating the code to using linewidth is not a huge operation.</p> <blockquote> <p>There is nothing preventing you from keeping the code as is &mdash; it will continue to work as always. However, your users may begin to feel a disconnect with the style as they adapt to the new <code>linewidth</code> aesthetic so it is highly recommended to make the proposed changes</p> </blockquote> <h3 id="the-fix">The fix <a href="#the-fix"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>There are a few things you need to do to update the old code but they are all pretty benign. The changes are commented in the code below and will also be discussed afterwards.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>GeomCircle</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggproto.html'>ggproto</a></span><span class='o'>(</span><span class='s'>"GeomCircle"</span>, <span class='nv'>Geom</span>, draw_panel <span class='o'>=</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>data</span>, <span class='nv'>panel_params</span>, <span class='nv'>coord</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='c'># Expand x, y, radius data to points along circle</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/funprog.html'>Map</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, <span class='nv'>r</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nv'>radians</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>2</span><span class='o'>*</span><span class='nv'>pi</span>, length.out <span class='o'>=</span> <span class='m'>101</span><span class='o'>)</span><span class='o'>[</span><span class='o'>-</span><span class='m'>1</span><span class='o'>]</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>cos</a></span><span class='o'>(</span><span class='nv'>radians</span><span class='o'>)</span> <span class='o'>*</span> <span class='nv'>r</span> <span class='o'>+</span> <span class='nv'>x</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/Trig.html'>sin</a></span><span class='o'>(</span><span class='nv'>radians</span><span class='o'>)</span> <span class='o'>*</span> <span class='nv'>r</span> <span class='o'>+</span> <span class='nv'>y</span> <span class='o'>)</span> <span class='o'>&#125;</span>, x <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>y</span>, r <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>radius</span><span class='o'>)</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/do.call.html'>do.call</a></span><span class='o'>(</span><span class='nv'>rbind</span>, <span class='nv'>circle_data</span><span class='o'>)</span> <span class='c'># Transform to viewport coords</span> <span class='nv'>circle_data</span> <span class='o'>&lt;-</span> <span class='nv'>coord</span><span class='o'>$</span><span class='nf'>transform</span><span class='o'>(</span><span class='nv'>circle_data</span>, <span class='nv'>panel_params</span><span class='o'>)</span> <span class='c'># Draw as grob</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/grid.polygon.html'>polygonGrob</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nv'>circle_data</span><span class='o'>$</span><span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>circle_data</span><span class='o'>$</span><span class='nv'>y</span>, id.lengths <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='m'>100</span>, <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>data</span><span class='o'>)</span><span class='o'>)</span>, default.units <span class='o'>=</span> <span class='s'>"native"</span>, gp <span class='o'>=</span> <span class='nf'>grid</span><span class='nf'>::</span><span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> col <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>colour</span>, fill <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>fill</span>, <span class='c'># Use linewidth or fall back to size in old ggplot2 versions</span> lwd <span class='o'>=</span> <span class='o'>(</span><span class='nv'>data</span><span class='o'>$</span><span class='nv'>linewidth</span> <span class='o'><a href='https://rlang.r-lib.org/reference/op-null-default.html'>%||%</a></span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>size</span><span class='o'>)</span> <span class='o'>*</span> <span class='nv'>.pt</span>, lty <span class='o'>=</span> <span class='nv'>data</span><span class='o'>$</span><span class='nv'>linetype</span> <span class='o'>)</span> <span class='o'>)</span> <span class='o'>&#125;</span>, required_aes <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"x"</span>, <span class='s'>"y"</span>, <span class='s'>"radius"</span><span class='o'>)</span>, default_aes <span class='o'>=</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span> colour <span class='o'>=</span> <span class='s'>"black"</span>, fill <span class='o'>=</span> <span class='s'>"grey"</span>, <span class='c'># Switch size to linewidth</span> linewidth <span class='o'>=</span> <span class='m'>0.5</span>, linetype <span class='o'>=</span> <span class='m'>1</span>, alpha <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span>, draw_key <span class='o'>=</span> <span class='nv'>draw_key_polygon</span>, <span class='c'># To allow using size in ggplot2 &lt; 3.4.0</span> non_missing_aes <span class='o'>=</span> <span class='s'>"size"</span>, <span class='c'># Tell ggplot2 to perform automatic renaming</span> rename_size <span class='o'>=</span> <span class='kc'>TRUE</span> <span class='o'>)</span></code></pre> </div> <p>As we can see above, we need two changes and two additions to our implementation. First (but last in the code), we add <code>rename_size = TRUE</code> to our geom implementation. This instructs ggplot2 that this layer has a <code>size</code> aesthetic that should be converted automatically with a deprecation warning. Setting this to <code>TRUE</code> allows you to rest assured that as far as your code goes you can expect to have a <code>linewidth</code> aesthetic. Second, we update the <code>default_aes</code> to use <code>linewidth</code> instead of <code>size</code>. Third, wherever we use <code>size</code> in our geom logic we instead use <code>linewidth %||% size</code>. The reason for the fallback is that if your package is used together with an older version of ggplot2 the <code>rename_size = TRUE</code> line has no effect and you need to fall back to <code>size</code> if that is what the user has specified. Fourth, we add <code>size</code> to the <code>non_missing_aes</code> field. As with the last point, this is only relevant for use with older versions of ggplot2 as it instructs the geom to not warn when <code>size</code> is used.</p> <p>Let&rsquo;s try out the new implementation:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>random_points</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_circle</span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>y</span>, radius <span class='o'>=</span> <span class='nv'>radius</span>, size <span class='o'>=</span> <span class='nv'>value</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #00BB00;'>size</span> aesthetic has been deprecated for use with lines as of ggplot2 3.4.0</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> Please use <span style='color: #00BB00;'>linewidth</span> aesthetic instead</span> <span class='c'>#&gt; <span style='color: #555555;'>This message is displayed once every 8 hours.</span></span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>We see that we get the deprecation warning we know and that everything also renders as expected. Using the new naming also works, picks up the linear <code>linewidth</code> scale, and doesn&rsquo;t have a warning.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>random_points</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_circle</span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>x</span>, y <span class='o'>=</span> <span class='nv'>y</span>, radius <span class='o'>=</span> <span class='nv'>radius</span>, linewidth <span class='o'>=</span> <span class='nv'>value</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The legend looks a bit wonky, but that is because the polygon key function caps the linewidth at a certain size relative to the size of the key. We can see that it works fine using a lower range:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/last_plot.html'>last_plot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_linewidth.html'>scale_linewidth</a></span><span class='o'>(</span>range <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.1</span>, <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="faq">FAQ <a href="#faq"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><em>I&rsquo;m creating a geom as a subclass of one of the ggplot2 geoms that now uses <code>linewidth</code> &mdash; what should I do?</em></p> <p>If your geom inherits from e.g.  <a href="https://ggplot2.tidyverse.org/reference/geom_polygon.html" target="_blank" rel="noopener"><code>geom_polygon()</code></a> which in the next version will begin using <code>linewidth</code> all you have to do is to update your code to refer to <code>linetype</code> instead of <code>size</code> if it uses that anywhere. Your geom will already inherit the correct <code>rename_size</code> value.</p> <p><em>I&rsquo;m creating a stat &mdash; should I do anything?</em></p> <p>Probably not. The only exception is if you set <code>size</code> in <code>default_aes</code> to a calculated value and the expectance is that the geom used with the stat will change to using <code>linewidth</code>. In such situations you should change the <code>default_aes</code> setting to use <code>linewidth</code> instead. We haven&rsquo;t had any such situations in the ggplot2 code base so the chance of this being relevant is pretty low.</p> <p><em>I&rsquo;m creating a geom that uses <code>size</code> for both point sizing and line width &mdash; how should I proceed?</em></p> <p>If you have a geom where <code>size</code> doubles for both point sizes and linewidth (an example from ggplot2 is <a href="https://ggplot2.tidyverse.org/reference/geom_linerange.html" target="_blank" rel="noopener"><code>geom_pointrange()</code></a>) you shouldn&rsquo;t set <code>rename_size = TRUE</code> since <code>size</code> remains a valid aesthetic. However, you should add <code>linewidth</code> to <code>default_aes</code> and use this wherever in your code <code>size</code> was used for linewidth scaling before. Do note that this is a breaking change for your users since the same piece of code may no longer produce the same output.</p> censored 0.1.0 https://www.tidyverse.org/blog/2022/08/censored-0-1-0/ Wed, 10 Aug 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/08/censored-0-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re extremely pleased to announce the first release of <a href="https://censored.tidymodels.org" target="_blank" rel="noopener">censored</a> on CRAN. The censored package is a parsnip extension package for survival models.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"censored"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will introduce a new model type, a new mode, and new prediction types for survival analysis in the tidymodels framework. We have <a href="https://www.tidyverse.org/blog/2021/11/survival-analysis-parsnip-adjacent/" target="_blank" rel="noopener">previously</a> blogged about these changes while they were in development, now they have been released!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/censored'>censored</a></span><span class='o'>)</span> <span class='c'>#&gt; Loading required package: parsnip</span> <span class='c'>#&gt; Loading required package: survival</span></code></pre> </div> <h2 id="model-types-modes-and-engines">Model types, modes, and engines <a href="#model-types-modes-and-engines"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A parsnip model specification consists of three elements:</p> <ul> <li>a <strong>model type</strong> such as linear model, random forest, support vector machine, etc</li> <li>a computational <strong>engine</strong> such as a specific R package or tools outside of R like Keras or Stan</li> <li>a <strong>mode</strong> such as regression or classification</li> </ul> <p>parsnip 1.0.0 introduces a new mode <code>&quot;censored regression&quot;</code> and the censored package provides engines to fit various models in this new mode. With the addition of the new <a href="https://parsnip.tidymodels.org/reference/proportional_hazards.html" target="_blank" rel="noopener"><code>proportional_hazards()</code></a> model type, the available models cover parametric, semi-parametric, and tree-based models:</p> <table> <thead> <tr> <th align="left">model</th> <th align="left">engine</th> </tr> </thead> <tbody> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/bag_tree.html" target="_blank" rel="noopener"><code>bag_tree()</code></a></td> <td align="left">rpart</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/boost_tree.html" target="_blank" rel="noopener"><code>boost_tree()</code></a></td> <td align="left">mboost</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/decision_tree.html" target="_blank" rel="noopener"><code>decision_tree()</code></a></td> <td align="left">rpart</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/decision_tree.html" target="_blank" rel="noopener"><code>decision_tree()</code></a></td> <td align="left">partykit</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/proportional_hazards.html" target="_blank" rel="noopener"><code>proportional_hazards()</code></a></td> <td align="left">survival</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/proportional_hazards.html" target="_blank" rel="noopener"><code>proportional_hazards()</code></a></td> <td align="left">glmnet</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/rand_forest.html" target="_blank" rel="noopener"><code>rand_forest()</code></a></td> <td align="left">partykit</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/survival_reg.html" target="_blank" rel="noopener"><code>survival_reg()</code></a></td> <td align="left">survival</td> </tr> <tr> <td align="left"> <a href="https://parsnip.tidymodels.org/reference/survival_reg.html" target="_blank" rel="noopener"><code>survival_reg()</code></a></td> <td align="left">flexsurv</td> </tr> </tbody> </table> <p>All models can be fitted through a formula interface. For example, when the engine allows for stratification variables, these can be specified by using a <a href="https://rdrr.io/pkg/survival/man/strata.html" target="_blank" rel="noopener"><code>strata()</code></a> term in the formula, as in the survival package.</p> <p>The <code>cetaceans</code> data set contains information about dolphins and whales living in captivity in the USA. It is derived from a <a href="https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-12-18" target="_blank" rel="noopener">Tidy Tuesday data set</a> and you can install the corresponding data package with <code>pak::pak(&quot;hfrick/cetaceans&quot;)</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'>cetaceans</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nv'>cetaceans</span><span class='o'>)</span> <span class='c'>#&gt; tibble [1,358 × 10] (S3: tbl_df/tbl/data.frame)</span> <span class='c'>#&gt; $ age : num [1:1358] 28 44 39 38 38 37 36 36 35 34 ...</span> <span class='c'>#&gt; $ event : num [1:1358] 0 0 0 0 0 0 0 0 0 0 ...</span> <span class='c'>#&gt; $ species : chr [1:1358] "Bottlenose" "Bottlenose" "Bottlenose" "Bottlenose" ...</span> <span class='c'>#&gt; $ sex : chr [1:1358] "F" "F" "M" "F" ...</span> <span class='c'>#&gt; $ birth_decade : num [1:1358] 1980 1970 1970 1970 1970 1980 1980 1980 1980 1980 ...</span> <span class='c'>#&gt; $ born_in_captivity: logi [1:1358] TRUE TRUE TRUE TRUE TRUE TRUE ...</span> <span class='c'>#&gt; $ time_in_captivity: num [1:1358] 1 1 1 1 1 1 1 1 1 1 ...</span> <span class='c'>#&gt; $ origin_location : chr [1:1358] "Marineland Florida" "Dolphin Research Center" "SeaWorld" "SeaWorld" ...</span> <span class='c'>#&gt; $ transfers : int [1:1358] 0 0 13 1 2 2 2 2 3 4 ...</span> <span class='c'>#&gt; $ current_location : chr [1:1358] "Marineland Florida" "Dolphin Research Center" "SeaWorld" "SeaWorld" ...</span></code></pre> </div> <p>To illustrate the new modelling function <a href="https://parsnip.tidymodels.org/reference/proportional_hazards.html" target="_blank" rel="noopener"><code>proportional_hazards()</code></a> and the formula interface for glmnet, let&rsquo;s fit a penalized Cox model.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>cox_penalized</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/proportional_hazards.html'>proportional_hazards</a></span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='m'>0.1</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span><span class='s'>"censored regression"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/pkg/survival/man/Surv.html'>Surv</a></span><span class='o'>(</span><span class='nv'>age</span>, <span class='nv'>event</span><span class='o'>)</span> <span class='o'>~</span> <span class='nv'>sex</span> <span class='o'>+</span> <span class='nv'>transfers</span> <span class='o'>+</span> <span class='nf'><a href='https://rdrr.io/pkg/survival/man/strata.html'>strata</a></span><span class='o'>(</span><span class='nv'>born_in_captivity</span><span class='o'>)</span>, data <span class='o'>=</span> <span class='nv'>cetaceans</span> <span class='o'>)</span></code></pre> </div> <h2 id="prediction-types">Prediction types <a href="#prediction-types"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For censored regression, parsnip now also includes new prediction types:</p> <ul> <li><code>&quot;time&quot;</code> for the survival time</li> <li><code>&quot;survival&quot;</code> for the survival probability</li> <li><code>&quot;hazard&quot;</code> for the hazard</li> <li><code>&quot;quantile&quot;</code> for quantiles of the event time distribution</li> <li><code>&quot;linear_pred&quot;</code> for the linear predictor</li> </ul> <p>Predictions made with censored respect the tidymodels principles of:</p> <ul> <li>The predictions are always inside a tibble.</li> <li>The column names and types are unsurprising and predictable.</li> <li>The number of rows in <code>new_data</code> and the output are the same.</li> </ul> <p>Let&rsquo;s demonstrate that with a small data set to predict on: just three observations, and the first one includes a missing value for one of the predictors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>cetaceans_3</span> <span class='o'>&lt;-</span> <span class='nv'>cetaceans</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>3</span>,<span class='o'>]</span> <span class='nv'>cetaceans_3</span><span class='o'>$</span><span class='nv'>sex</span><span class='o'>[</span><span class='m'>1</span><span class='o'>]</span> <span class='o'>&lt;-</span> <span class='kc'>NA</span></code></pre> </div> <p>Predictions of types <code>&quot;time&quot;</code> and <code>&quot;survival&quot;</code> are available for all model/engine combinations in censored.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox_penalized</span>, new_data <span class='o'>=</span> <span class='nv'>cetaceans_3</span>, type <span class='o'>=</span> <span class='s'>"time"</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span> <span class='c'>#&gt; .pred_time</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 31.8</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 52.6</span></code></pre> </div> <p>Survival probability can be predicted at multiple time points, specified through the <code>time</code> argument to <a href="https://rdrr.io/r/stats/predict.html" target="_blank" rel="noopener"><code>predict()</code></a>. Here we are predicting survival probability at age 10, 20, 30, and 40 years.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>pred</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox_penalized</span>, new_data <span class='o'>=</span> <span class='nv'>cetaceans_3</span>, type <span class='o'>=</span> <span class='s'>"survival"</span>, time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>10</span>, <span class='m'>20</span>, <span class='m'>30</span>, <span class='m'>40</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>pred</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span> <span class='c'>#&gt; .pred </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;tibble [4 × 2]&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;tibble [4 × 2]&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;tibble [4 × 2]&gt;</span></span></code></pre> </div> <p>The <code>.pred</code> column is a list-column, containing nested tibbles:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># for the observation with NA</span> <span class='nv'>pred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span> <span class='c'>#&gt; .time .pred_survival</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 10 <span style='color: #BB0000;'>NA</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 20 <span style='color: #BB0000;'>NA</span></span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 30 <span style='color: #BB0000;'>NA</span></span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> 40 <span style='color: #BB0000;'>NA</span></span> <span class='c'># without NA</span> <span class='nv'>pred</span><span class='o'>$</span><span class='nv'>.pred</span><span class='o'>[[</span><span class='m'>2</span><span class='o'>]</span><span class='o'>]</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span> <span class='c'>#&gt; .time .pred_survival</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 10 0.729</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 20 0.567</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 30 0.386</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> 40 0.386</span></code></pre> </div> <p>This can be used to visualize an approximation of the underlying survival curve.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/stats/predict.html'>predict</a></span><span class='o'>(</span><span class='nv'>cox_penalized</span>, new_data <span class='o'>=</span> <span class='nv'>cetaceans</span><span class='o'>[</span><span class='m'>2</span><span class='o'>:</span><span class='m'>3</span>,<span class='o'>]</span>, type <span class='o'>=</span> <span class='s'>"survival"</span>, time <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>80</span>, <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>id <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>:</span><span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/nest.html'>unnest</a></span><span class='o'>(</span>cols <span class='o'>=</span> <span class='nv'>.pred</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>.time</span>, y <span class='o'>=</span> <span class='nv'>.pred_survival</span>, col <span class='o'>=</span> <span class='nv'>id</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_step</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggtheme.html'>theme_bw</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/survival-curve-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>More examples of available models, engines, and prediction types can be found in the article <a href="https://censored.tidymodels.org/articles/examples.html" target="_blank" rel="noopener">Fitting and Predicting with censored</a>.</p> <h2 id="whats-next">What&rsquo;s next? <a href="#whats-next"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Our aim is to broadly integrate survival analysis in the tidymodels framework. Next, we&rsquo;ll be working on adding appropriate metrics to the yardstick package and enabling model tuning via the tune package.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all the contributors: <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/caimiao0714" target="_blank" rel="noopener">@caimiao0714</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dvdsb" target="_blank" rel="noopener">@dvdsb</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/erikvona" target="_blank" rel="noopener">@erikvona</a>, <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/schelhorn" target="_blank" rel="noopener">@schelhorn</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> rsample 1.1.0 https://www.tidyverse.org/blog/2022/08/rsample-1-1-0/ Mon, 08 Aug 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/08/rsample-1-1-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re downright exhilarated to announce the release of <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a> 1.1.0. The rsample package makes it easy to create resamples for estimating distributions and assessing model performance.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"rsample"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will walk through some of the highlights from this newest release. You can see a full list of changes in the <a href="https://rsample.tidymodels.org/news/index.html#rsample-110" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="grouped-resampling">Grouped resampling <a href="#grouped-resampling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By far and away the biggest addition in this version of rsample is the set of new functions for grouped resampling. Grouped resampling is a form of resampling where observations need to be assigned to the analysis or assessment sets as a &ldquo;group&rdquo;, not split between the two. This is a common need when some of your data is more closely related than would be expected under random chance: for instance, when taking multiple measurements of a single patient over time, or when your data is geographically clustered into distinct &ldquo;locations&rdquo; like different neighborhoods.</p> <p>The rsample package has supported grouped v-fold cross-validation for a few years, through the <a href="https://rsample.tidymodels.org/reference/group_vfold_cv.html" target="_blank" rel="noopener"><code>group_vfold_cv()</code></a> function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='http://purrr.tidyverse.org'>purrr</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rsample.tidymodels.org'>rsample</a></span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>ames</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span> <span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/group_vfold_cv.html'>group_vfold_cv</a></span><span class='o'>(</span><span class='nv'>ames</span>, group <span class='o'>=</span> <span class='nv'>Neighborhood</span>, v <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span> <span class='nv'>resample</span><span class='o'>$</span><span class='nv'>splits</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_lgl</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://rdrr.io/r/base/any.html'>any</a></span><span class='o'>(</span><span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>assessment</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>Neighborhood</span> <span class='o'>%in%</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>$</span><span class='nv'>Neighborhood</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='o'>)</span> <span class='c'>#&gt; [1] FALSE FALSE</span></code></pre> </div> <p>rsample 1.1.0 extends this support by adding four new functions for grouped resampling. The new functions <a href="https://rsample.tidymodels.org/reference/group_bootstraps.html" target="_blank" rel="noopener"><code>group_bootstraps()</code></a>, <a href="https://rsample.tidymodels.org/reference/group_mc_cv.html" target="_blank" rel="noopener"><code>group_mc_cv()</code></a>, <a href="https://rsample.tidymodels.org/reference/validation_split.html" target="_blank" rel="noopener"><code>group_validation_split()</code></a>, and <a href="https://rsample.tidymodels.org/reference/initial_split.html" target="_blank" rel="noopener"><code>group_initial_split()</code></a> all work like their ungrouped versions, but let you specify a grouping column to make sure related observations are all assigned to the same sets:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Bootstrap resampling with replacement:</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/group_bootstraps.html'>group_bootstraps</a></span><span class='o'>(</span><span class='nv'>ames</span>, <span class='nv'>Neighborhood</span>, times <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='c'>#&gt; # Group bootstrap sampling </span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span> <span class='c'>#&gt; splits id </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [3050/1225]&gt;</span> Bootstrap1</span> <span class='c'># Random resampling without replacement:</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/group_mc_cv.html'>group_mc_cv</a></span><span class='o'>(</span><span class='nv'>ames</span>, <span class='nv'>Neighborhood</span>, times <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='c'>#&gt; # Group Monte Carlo cross-validation (0.75/0.25) with 1 resamples </span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span> <span class='c'>#&gt; splits id </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [2198/732]&gt;</span> Resample1</span> <span class='c'># Data splitting to create a validation set:</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/validation_split.html'>group_validation_split</a></span><span class='o'>(</span><span class='nv'>ames</span>, <span class='nv'>Neighborhood</span><span class='o'>)</span> <span class='c'>#&gt; # Group Validation Set Split (0.75/0.25) </span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2</span></span> <span class='c'>#&gt; splits id </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [2201/729]&gt;</span> validation</span> <span class='c'># Data splitting to create an initial training/testing split:</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/initial_split.html'>group_initial_split</a></span><span class='o'>(</span><span class='nv'>ames</span>, <span class='nv'>Neighborhood</span><span class='o'>)</span> <span class='c'>#&gt; &lt;Training/Testing/Total&gt;</span> <span class='c'>#&gt; &lt;2162/768/2930&gt;</span></code></pre> </div> <p>These functions all target assigning a certain proportion of your data to the assessment fold. Hitting that target can be tricky when your groups aren&rsquo;t all the same size, however. To work around this, these new functions create a list of all the groups in your data, randomly reshuffle it, and then select the first <em>n</em> groups in the list that results in splitting the data as close to that proportion as possible. The net effect of this on users is that your analysis and assessment folds won&rsquo;t always be precisely the size you&rsquo;re targeting (particularly if you have a few large groups), but all data in a single group will always be entirely assigned to the same set and the splits will be entirely randomly created.</p> <p>The other big change to grouped resampling comes as a new argument to <a href="https://rsample.tidymodels.org/reference/group_vfold_cv.html" target="_blank" rel="noopener"><code>group_vfold_cv()</code></a>. By default, <a href="https://rsample.tidymodels.org/reference/group_vfold_cv.html" target="_blank" rel="noopener"><code>group_vfold_cv()</code></a> assigns roughly the same number of groups to each of your folds, so you wind up with the same number of patients, or neighborhoods, or whatever else you&rsquo;re grouping by in each assessment set. The new <code>balance</code> argument lets you instead assign roughly the same number of rows to each fold instead, if you set <code>balance = observations</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rsample.tidymodels.org/reference/group_vfold_cv.html'>group_vfold_cv</a></span><span class='o'>(</span><span class='nv'>ames</span>, <span class='nv'>Neighborhood</span>, balance <span class='o'>=</span> <span class='s'>"observations"</span><span class='o'>)</span> <span class='c'>#&gt; # Group 28-fold cross-validation </span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 28 × 2</span></span> <span class='c'>#&gt; splits id </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'> 1</span> <span style='color: #555555;'>&lt;split [2928/2]&gt;</span> Resample01</span> <span class='c'>#&gt; <span style='color: #555555;'> 2</span> <span style='color: #555555;'>&lt;split [2922/8]&gt;</span> Resample02</span> <span class='c'>#&gt; <span style='color: #555555;'> 3</span> <span style='color: #555555;'>&lt;split [2907/23]&gt;</span> Resample03</span> <span class='c'>#&gt; <span style='color: #555555;'> 4</span> <span style='color: #555555;'>&lt;split [2736/194]&gt;</span> Resample04</span> <span class='c'>#&gt; <span style='color: #555555;'> 5</span> <span style='color: #555555;'>&lt;split [2886/44]&gt;</span> Resample05</span> <span class='c'>#&gt; <span style='color: #555555;'> 6</span> <span style='color: #555555;'>&lt;split [2893/37]&gt;</span> Resample06</span> <span class='c'>#&gt; <span style='color: #555555;'> 7</span> <span style='color: #555555;'>&lt;split [2929/1]&gt;</span> Resample07</span> <span class='c'>#&gt; <span style='color: #555555;'> 8</span> <span style='color: #555555;'>&lt;split [2663/267]&gt;</span> Resample08</span> <span class='c'>#&gt; <span style='color: #555555;'> 9</span> <span style='color: #555555;'>&lt;split [2805/125]&gt;</span> Resample09</span> <span class='c'>#&gt; <span style='color: #555555;'>10</span> <span style='color: #555555;'>&lt;split [2837/93]&gt;</span> Resample10</span> <span class='c'>#&gt; <span style='color: #555555;'># … with 18 more rows</span></span> <span class='c'>#&gt; <span style='color: #555555;'># ℹ Use `print(n = ...)` to see more rows</span></span></code></pre> </div> <p>This approach works in a similar way to the new grouped resampling functions, attempting to assign roughly <code>1 / v</code> of your data to each fold. When working with unbalanced groups, this can result in much more even assignments of data to each fold:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span> <span class='nv'>analysis_sd</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>v</span>, <span class='nv'>balance</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/group_vfold_cv.html'>group_vfold_cv</a></span><span class='o'>(</span> <span class='nv'>ames</span>, <span class='nv'>Neighborhood</span>, <span class='nv'>v</span>, balance <span class='o'>=</span> <span class='nv'>balance</span> <span class='o'>)</span><span class='o'>$</span><span class='nv'>splits</span> <span class='o'>%&gt;%</span> <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>map_dbl</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='nv'>.x</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://rdrr.io/r/stats/sd.html'>sd</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'>tidyr</span><span class='nf'>::</span><span class='nf'><a href='https://tidyr.tidyverse.org/reference/expand.html'>crossing</a></span><span class='o'>(</span> idx <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq_len</a></span><span class='o'>(</span><span class='m'>100</span><span class='o'>)</span>, v <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>5</span>, <span class='m'>10</span>, <span class='m'>15</span><span class='o'>)</span>, balance <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"groups"</span>, <span class='s'>"observations"</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>resample</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>sd <span class='o'>=</span> <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map2.html'>pmap_dbl</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='nv'>v</span>, <span class='nv'>balance</span><span class='o'>)</span>, <span class='nv'>analysis_sd</span> <span class='o'>)</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>sd</span>, fill <span class='o'>=</span> <span class='nv'>balance</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_histogram.html'>geom_histogram</a></span><span class='o'>(</span>alpha <span class='o'>=</span> <span class='m'>0.6</span>, color <span class='o'>=</span> <span class='s'>"black"</span>, size <span class='o'>=</span> <span class='m'>0.3</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/facet_wrap.html'>facet_wrap</a></span><span class='o'>(</span><span class='o'>~</span> <span class='nv'>v</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggtheme.html'>theme_minimal</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"sd() of nrow(analysis) by balance method"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Right now, these grouping functions don&rsquo;t support stratification. If you have thoughts on how you&rsquo;d expect stratification to work with grouping, or have an example of how another implementation has handled it, <a href="https://github.com/tidymodels/rsample/issues/317" target="_blank" rel="noopener">let us know on GitHub</a>!</p> <h2 id="other-improvements">Other improvements <a href="#other-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This release also adds a few new utility functions to make it easier to work with the rsets produced by rsample functions.</p> <p>For instance, the new <a href="https://rsample.tidymodels.org/reference/reshuffle_rset.html" target="_blank" rel="noopener"><code>reshuffle_rset()</code></a> will re-generate an rset, using the same arguments as were used to originally create it, but with the current random seed:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/vfold_cv.html'>vfold_cv</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='nv'>resample</span><span class='o'>$</span><span class='nv'>splits</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'>#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span> <span class='c'>#&gt; Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4</span> <span class='c'>#&gt; Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1</span> <span class='c'>#&gt; Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1</span> <span class='c'>#&gt; Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2</span> <span class='c'>#&gt; Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1</span> <span class='c'>#&gt; Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4</span> <span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/reshuffle_rset.html'>reshuffle_rset</a></span><span class='o'>(</span><span class='nv'>resample</span><span class='o'>)</span> <span class='nv'>resample</span><span class='o'>$</span><span class='nv'>splits</span><span class='o'>[[</span><span class='m'>1</span><span class='o'>]</span><span class='o'>]</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/as.data.frame.rsplit.html'>analysis</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'><a href='https://rdrr.io/r/utils/head.html'>head</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'>#&gt; mpg cyl disp hp drat wt qsec vs am gear carb</span> <span class='c'>#&gt; Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4</span> <span class='c'>#&gt; Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4</span> <span class='c'>#&gt; Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1</span> <span class='c'>#&gt; Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1</span> <span class='c'>#&gt; Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2</span> <span class='c'>#&gt; Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4</span></code></pre> </div> <p>This works with repeated cross-validation, stratification, grouping &ndash; anything you did originally should be preserved when reshuffling the rset.</p> <p>Additionally, the new <a href="https://rsample.tidymodels.org/reference/reverse_splits.html" target="_blank" rel="noopener"><code>reverse_splits()</code></a> function will &ldquo;swap&rdquo; the assessment and analysis folds of any rsplit or rset object:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>resample</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/initial_split.html'>initial_split</a></span><span class='o'>(</span><span class='nv'>mtcars</span><span class='o'>)</span> <span class='nv'>resample</span> <span class='c'>#&gt; &lt;Training/Testing/Total&gt;</span> <span class='c'>#&gt; &lt;24/8/32&gt;</span> <span class='nf'><a href='https://rsample.tidymodels.org/reference/reverse_splits.html'>reverse_splits</a></span><span class='o'>(</span><span class='nv'>resample</span><span class='o'>)</span> <span class='c'>#&gt; &lt;Training/Testing/Total&gt;</span> <span class='c'>#&gt; &lt;8/24/32&gt;</span></code></pre> </div> <p>This is just scratching the surface of the new features and improvements in this release of rsample! You can see a full list of changes in the the <a href="https://rsample.tidymodels.org/news/index.html#rsample-110" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank everyone that has contributed since the last release: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/sametsoekel" target="_blank" rel="noopener">@sametsoekel</a>.</p> Q2 2022 tidymodels digest https://www.tidyverse.org/blog/2022/07/tidymodels-2022-q2/ Tue, 19 Jul 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/07/tidymodels-2022-q2/ <!-- TODO: * [X] Look over / edit the post's title in the yaml * [X] Edit (or delete) the description; note this appears in the Twitter card * [X] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [X] Find photo & update yaml metadata * [X] Create `thumbnail-sq.jpg`; height and width should be equal * [X] Create `thumbnail-wd.jpg`; width should be >5x height * [X] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [X] Add intro sentence, e.g. the standard tagline for the package * [X] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span> <span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching packages</span> ────────────────────────────────────── tidymodels 1.0.0 ──</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>broom </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>recipes </span> 1.0.1</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dials </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>rsample </span> 1.0.0</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.1.7</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.3.6 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>infer </span> 1.0.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tune </span> 1.0.0</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>modeldata </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflows </span> 1.0.0</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>parsnip </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflowsets</span> 1.0.0</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>yardstick </span> 1.0.0</span> <span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>purrr</span>::<span style='color: #00BB00;'>discard()</span> masks <span style='color: #0000BB;'>scales</span>::discard()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>recipes</span>::<span style='color: #00BB00;'>step()</span> masks <span style='color: #0000BB;'>stats</span>::step()</span> <span class='c'>#&gt; <span style='color: #0000BB;'>•</span> Search for functions across packages at <span style='color: #00BB00;'>https://www.tidymodels.org/find/</span></span></code></pre> </div> <p>Since the beginning of last year, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these from the past month or so:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2022/06/spatialsample-0-2-0/" target="_blank" rel="noopener">spatialsample</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/05/recipes-update-05-20222/" target="_blank" rel="noopener">recipes and its extension packages</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/06/bonsai-0-1-0/" target="_blank" rel="noopener">bonsai</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2022/04/tidymodels-2022-q1/" target="_blank" rel="noopener">our last roundup post</a>, there have been CRAN releases of 25 tidymodels packages. You can install these updates from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span> <span class='s'>"rsample"</span>, <span class='s'>"spatialsample"</span>, <span class='s'>"parsnip"</span>, <span class='s'>"baguette"</span>, <span class='s'>"multilevelmod"</span>, <span class='s'>"discrim"</span>, <span class='s'>"plsmod"</span>, <span class='s'>"poissonreg"</span>, <span class='s'>"rules"</span>, <span class='s'>"recipes"</span>, <span class='s'>"embed"</span>, <span class='s'>"themis"</span>, <span class='s'>"textrecipes"</span>, <span class='s'>"workflows"</span>, <span class='s'>"workflowsets"</span>, <span class='s'>"tune"</span>, <span class='s'>"yardstick"</span>, <span class='s'>"broom"</span>, <span class='s'>"dials"</span>, <span class='s'>"butcher"</span>, <span class='s'>"hardhat"</span>, <span class='s'>"infer"</span>, <span class='s'>"stacks"</span>, <span class='s'>"tidyposterior"</span>, <span class='s'>"tidypredict"</span> <span class='o'>)</span><span class='o'>)</span></code></pre> </div> <ul> <li> <a href="https://baguette.tidymodels.org/news/index.html#baguette-100" target="_blank" rel="noopener">baguette</a></li> <li> <a href="https://broom.tidymodels.org/news/index.html#broom-080" target="_blank" rel="noopener">broom</a></li> <li> <a href="https://butcher.tidymodels.org/news/index.html#butcher-020" target="_blank" rel="noopener">butcher</a></li> <li> <a href="https://dials.tidymodels.org/news/index.html#dials-100" target="_blank" rel="noopener">dials</a></li> <li> <a href="https://discrim.tidymodels.org/news/index.html#discrim-100" target="_blank" rel="noopener">discrim</a></li> <li> <a href="https://embed.tidymodels.org/news/index.html#embed-100" target="_blank" rel="noopener">embed</a></li> <li> <a href="https://hardhat.tidymodels.org/news/index.html#hardhat-120" target="_blank" rel="noopener">hardhat</a></li> <li> <a href="https://infer.tidymodels.org/news/index.html#infer-v102" target="_blank" rel="noopener">infer</a></li> <li> <a href="https://modeldata.tidymodels.org/news/index.html#modeldata-100" target="_blank" rel="noopener">modeldata</a></li> <li> <a href="https://multilevelmod.tidymodels.org/news/index.html#multilevelmod-100" target="_blank" rel="noopener">multilevelmod</a></li> <li> <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-100" target="_blank" rel="noopener">parsnip</a></li> <li> <a href="https://poissonreg.tidymodels.org/news/index.html#poissonreg-100" target="_blank" rel="noopener">poissonreg</a></li> <li> <a href="https://recipes.tidymodels.org/news/index.html#recipes-101" target="_blank" rel="noopener">recipes</a></li> <li> <a href="https://rsample.tidymodels.org/news/index.html#rsample-100" target="_blank" rel="noopener">rsample</a></li> <li> <a href="https://rules.tidymodels.org/news/index.html#rules-100" target="_blank" rel="noopener">rules</a></li> <li> <a href="https://spatialsample.tidymodels.org/news/index.html#spatialsample-020" target="_blank" rel="noopener">spatialsample</a></li> <li> <a href="https://stacks.tidymodels.org/news/index.html#stacks-023" target="_blank" rel="noopener">stacks</a></li> <li> <a href="https://textrecipes.tidymodels.org/news/index.html#textrecipes-100" target="_blank" rel="noopener">textrecipes</a></li> <li> <a href="https://themis.tidymodels.org/news/index.html#themis-100" target="_blank" rel="noopener">themis</a></li> <li> <a href="https://tidymodels.tidymodels.org/news/index.html#tidymodels-100" target="_blank" rel="noopener">tidymodels</a></li> <li> <a href="https://tidyposterior.tidymodels.org/news/index.html#tidyposterior-100" target="_blank" rel="noopener">tidyposterior</a></li> <li> <a href="https://tidypredict.tidymodels.org/news/index.html#tidypredict-049" target="_blank" rel="noopener">tidypredict</a></li> <li> <a href="https://tune.tidymodels.org/news/index.html#tune-100" target="_blank" rel="noopener">tune</a></li> <li> <a href="https://workflows.tidymodels.org/news/index.html#workflows-100" target="_blank" rel="noopener">workflows</a></li> <li> <a href="https://workflowsets.tidymodels.org/news/index.html#workflowsets-100" target="_blank" rel="noopener">workflowsets</a></li> <li> <a href="https://yardstick.tidymodels.org/news/index.html#yardstick-100" target="_blank" rel="noopener">yardstick</a></li> </ul> <p>The <code>NEWS</code> files are linked here for each package; you&rsquo;ll notice that there are a lot! We know it may be bothersome to keep up with all these changes, so we want to draw your attention to our recent blog posts above and also highlight a few more useful updates in today&rsquo;s blog post.</p> <p>We are confident that we have created a good foundation with our implementation across many of our packages and we are using this as an opportunity to bump the packages versions to 1.0.0.</p> <h2 id="case-weights">Case weights <a href="#case-weights"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Much of the work we have been doing so far this year has been related to case weights. For a more detailed account of the deliberations see this earlier post about the <a href="https://www.tidyverse.org/blog/2022/05/case-weights/" target="_blank" rel="noopener">use of case weights with tidymodels</a>.</p> <p>A full worked example can be found in the <a href="tidyverse.org/blog/2022/05/case-weights/#tidymodels-syntax">previous blog post</a> and on <a href="https://www.tidymodels.org/learn/work/case-weights/" target="_blank" rel="noopener">the tidymodels site</a>.</p> <p>As an example let&rsquo;s go over how case weights are used within tidymodels. We start by simulating a data set using <code>sim_classification()</code>, this data set is going to be unbalanced and we will be using importance weights to give more weight to the minority class. In tidymodels you can use <code>importance_weights()</code> or <code>frequency_weights()</code> to denote what type of weight you are working with. Setting the type of weight should be the first thing you do.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>)</span> <span class='nv'>training_sim</span> <span class='o'>&lt;-</span> <span class='nf'>sim_classification</span><span class='o'>(</span><span class='m'>5000</span>, intercept <span class='o'>=</span> <span class='o'>-</span><span class='m'>25</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>mutate</span><span class='o'>(</span> case_wts <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/ifelse.html'>ifelse</a></span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>==</span> <span class='s'>"class_1"</span>, <span class='m'>60</span>, <span class='m'>1</span><span class='o'>)</span>, case_wts <span class='o'>=</span> <span class='nf'>importance_weights</span><span class='o'>(</span><span class='nv'>case_wts</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>training_sim</span> <span class='o'>%&gt;%</span> <span class='nf'>relocate</span><span class='o'>(</span><span class='nv'>case_wts</span>, .after <span class='o'>=</span> <span class='nv'>class</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5,000 × 17</span></span> <span class='c'>#&gt; class case_wts two_factor_1 two_factor_2 non_linear_1 non_linear_2</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;imp_wts&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'> 1</span> class_2 1 0.092<span style='text-decoration: underline;'>4</span> -<span style='color: #BB0000;'>1.70</span> -<span style='color: #BB0000;'>0.579</span> 0.201</span> <span class='c'>#&gt; <span style='color: #555555;'> 2</span> class_2 1 -<span style='color: #BB0000;'>0.136</span> 0.608 -<span style='color: #BB0000;'>0.770</span> 0.114</span> <span class='c'>#&gt; <span style='color: #555555;'> 3</span> class_2 1 -<span style='color: #BB0000;'>0.080</span><span style='color: #BB0000; text-decoration: underline;'>6</span> -<span style='color: #BB0000;'>2.07</span> -<span style='color: #BB0000;'>0.709</span> 0.272</span> <span class='c'>#&gt; <span style='color: #555555;'> 4</span> class_2 1 1.35 2.75 -<span style='color: #BB0000;'>0.380</span> 0.785</span> <span class='c'>#&gt; <span style='color: #555555;'> 5</span> class_2 1 -<span style='color: #BB0000;'>0.238</span> 1.08 -<span style='color: #BB0000;'>0.700</span> 0.638</span> <span class='c'>#&gt; <span style='color: #555555;'> 6</span> class_2 1 -<span style='color: #BB0000;'>0.322</span> -<span style='color: #BB0000;'>1.79</span> 0.053<span style='text-decoration: underline;'>4</span> 0.470</span> <span class='c'>#&gt; <span style='color: #555555;'> 7</span> class_2 1 1.35 -<span style='color: #BB0000;'>0.102</span> -<span style='color: #BB0000;'>0.764</span> 0.827</span> <span class='c'>#&gt; <span style='color: #555555;'> 8</span> class_2 1 0.595 1.30 -<span style='color: #BB0000;'>0.045</span><span style='color: #BB0000; text-decoration: underline;'>4</span> 0.493</span> <span class='c'>#&gt; <span style='color: #555555;'> 9</span> class_2 1 0.563 0.916 -<span style='color: #BB0000;'>0.383</span> 0.775</span> <span class='c'>#&gt; <span style='color: #555555;'>10</span> class_2 1 -<span style='color: #BB0000;'>0.327</span> -<span style='color: #BB0000;'>0.457</span> -<span style='color: #BB0000;'>0.390</span> 0.704</span> <span class='c'>#&gt; <span style='color: #555555;'># … with 4,990 more rows, and 11 more variables: non_linear_3 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># linear_01 &lt;dbl&gt;, linear_02 &lt;dbl&gt;, linear_03 &lt;dbl&gt;, linear_04 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># linear_05 &lt;dbl&gt;, linear_06 &lt;dbl&gt;, linear_07 &lt;dbl&gt;, linear_08 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># linear_09 &lt;dbl&gt;, linear_10 &lt;dbl&gt;</span></span></code></pre> </div> <p>Now that we have the data we can the resamples we want. We assigned weights before creating the resamples so that information is being carried into the resamples. The weights are not used in the creation of the resamples.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span> <span class='nv'>sim_folds</span> <span class='o'>&lt;-</span> <span class='nf'>vfold_cv</span><span class='o'>(</span><span class='nv'>training_sim</span>, strata <span class='o'>=</span> <span class='nv'>class</span><span class='o'>)</span></code></pre> </div> <p>When creating the model specification we don&rsquo;t need to do anything special, as parsnip will apply case weights when there is support for it. If you are unsure if a model supports case weights you can consult the documentation or the <code>show_model_info()</code> function, like so: <code>show_model_info(&quot;logistic_reg&quot;)</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lr_spec</span> <span class='o'>&lt;-</span> <span class='nf'>logistic_reg</span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='nf'>tune</span><span class='o'>(</span><span class='o'>)</span>, mixture <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>set_engine</span><span class='o'>(</span><span class='s'>"glmnet"</span><span class='o'>)</span></code></pre> </div> <p>Next, we will set up a recipe for preprocessing</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>sim_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>training_sim</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>step_ns</span><span class='o'>(</span><span class='nf'>starts_with</span><span class='o'>(</span><span class='s'>"non_linear"</span><span class='o'>)</span>, deg_free <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>step_normalize</span><span class='o'>(</span><span class='nf'>all_numeric_predictors</span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>sim_rec</span> <span class='c'>#&gt; Recipe</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Inputs:</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; role #variables</span> <span class='c'>#&gt; case_weights 1</span> <span class='c'>#&gt; outcome 1</span> <span class='c'>#&gt; predictor 15</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Operations:</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Natural splines on starts_with("non_linear")</span> <span class='c'>#&gt; Centering and scaling for all_numeric_predictors()</span></code></pre> </div> <p>The recipe automatically detects the case weights even though they are captured by the dot on the right-hand side of the formula. The recipe automatically sets its role and will error if that column is changed in any way.</p> <p>As mentioned above, any unsupervised steps are unaffected by importance weights so neither <code>step_ns()</code> or <code>step_normalize()</code> use the weights in their calculations.</p> <p>When using case weights, we would like to encourage users to keep their model and preprocessing tool within a workflow. The workflows package now has an add_case_weights() function to help here:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lr_wflow</span> <span class='o'>&lt;-</span> <span class='nf'>workflow</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>add_model</span><span class='o'>(</span><span class='nv'>lr_spec</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>add_recipe</span><span class='o'>(</span><span class='nv'>sim_rec</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>add_case_weights</span><span class='o'>(</span><span class='nv'>case_wts</span><span class='o'>)</span> <span class='nv'>lr_wflow</span> <span class='c'>#&gt; ══ Workflow ════════════════════════════════════════════════════════════════════</span> <span class='c'>#&gt; <span style='font-style: italic;'>Preprocessor:</span> Recipe</span> <span class='c'>#&gt; <span style='font-style: italic;'>Model:</span> logistic_reg()</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; ── Preprocessor ────────────────────────────────────────────────────────────────</span> <span class='c'>#&gt; 2 Recipe Steps</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; • step_ns()</span> <span class='c'>#&gt; • step_normalize()</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; ── Case Weights ────────────────────────────────────────────────────────────────</span> <span class='c'>#&gt; case_wts</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; ── Model ───────────────────────────────────────────────────────────────────────</span> <span class='c'>#&gt; Logistic Regression Model Specification (classification)</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Main Arguments:</span> <span class='c'>#&gt; penalty = tune()</span> <span class='c'>#&gt; mixture = 1</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Computational engine: glmnet</span></code></pre> </div> <p>And that is all you need to use case weights, the remaining functions from the tune and yardstick package know how to deal with case weights depending on the type of weight.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>cls_metrics</span> <span class='o'>&lt;-</span> <span class='nf'>metric_set</span><span class='o'>(</span><span class='nv'>sensitivity</span>, <span class='nv'>specificity</span><span class='o'>)</span> <span class='nv'>grid</span> <span class='o'>&lt;-</span> <span class='nf'>tibble</span><span class='o'>(</span>penalty <span class='o'>=</span> <span class='m'>10</span><span class='o'>^</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='o'>-</span><span class='m'>3</span>, <span class='m'>0</span>, length.out <span class='o'>=</span> <span class='m'>20</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>3</span><span class='o'>)</span> <span class='nv'>lr_res</span> <span class='o'>&lt;-</span> <span class='nv'>lr_wflow</span> <span class='o'>%&gt;%</span> <span class='nf'>tune_grid</span><span class='o'>(</span>resamples <span class='o'>=</span> <span class='nv'>sim_folds</span>, grid <span class='o'>=</span> <span class='nv'>grid</span>, metrics <span class='o'>=</span> <span class='nv'>cls_metrics</span><span class='o'>)</span> <span class='nf'>autoplot</span><span class='o'>(</span><span class='nv'>lr_res</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="non-standard-roles-in-recipes">Non-standard roles in recipes <a href="#non-standard-roles-in-recipes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The recipes package use the idea of roles to determine how and when the different variables are used. The main roles are <code>&quot;outcome&quot;</code>, <code>&quot;predictor&quot;</code>, and now <code>&quot;case_weights&quot;</code>. You are also able to change the roles of these variables using <code>add_role()</code> and <code>update_role()</code>.</p> <p>With a recent addition of case weights as another type of standard role, we have made recipes more robust. It now checks that all columns in the <code>data</code> supplied to <code>recipe()</code> are also present in the <code>new_data</code> supplied to <code>bake()</code>. An exception is made for columns with roles of either <code>&quot;outcome&quot;</code> or <code>&quot;case_weights&quot;</code> because these are typically not required at <code>bake()</code> time.</p> <p>This change for stricter checking of roles will mean that you might need to make some small changes to your code if you are using non-standard roles.</p> <p>Let&rsquo;s look at the <code>tate_text</code> data set as an example:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='s'>"tate_text"</span><span class='o'>)</span> <span class='nf'>glimpse</span><span class='o'>(</span><span class='nv'>tate_text</span><span class='o'>)</span> <span class='c'>#&gt; Rows: 4,284</span> <span class='c'>#&gt; Columns: 5</span> <span class='c'>#&gt; $ id <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 21926, 20472, 20474, 20473, 20513, 21389, 121187, 19455, 20938,…</span> <span class='c'>#&gt; $ artist <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> "Absalon", "Auerbach, Frank", "Auerbach, Frank", "Auerbach, Fra…</span> <span class='c'>#&gt; $ title <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> "Proposals for a Habitat", "Michael", "Geoffrey", "Jake", "To t…</span> <span class='c'>#&gt; $ medium <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> "Video, monitor or projection, colour and sound (stereo)", "Etc…</span> <span class='c'>#&gt; $ year <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 199…</span></code></pre> </div> <p>This data set includes an <code>id</code> variable that shouldn&rsquo;t have any predictive power and a <code>title</code> variable that we want to ignore for now. We can let the recipe know that we don&rsquo;t want it to treat <code>id</code> and <code>title</code> as predictors by giving them a different role which we will call <code>&quot;id&quot;</code> here:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>tate_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>update_role</span><span class='o'>(</span><span class='nv'>id</span>, <span class='nv'>title</span>, new_role <span class='o'>=</span> <span class='s'>"id"</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>step_dummy_extract</span><span class='o'>(</span><span class='nv'>artist</span>, <span class='nv'>medium</span>, sep <span class='o'>=</span> <span class='s'>", "</span><span class='o'>)</span> <span class='nv'>tate_rec_prepped</span> <span class='o'>&lt;-</span> <span class='nf'>prep</span><span class='o'>(</span><span class='nv'>tate_rec</span><span class='o'>)</span></code></pre> </div> <p>This will now error when we try to apply the recipe to new data that contains only our predictors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>new_painting</span> <span class='o'>&lt;-</span> <span class='nf'>tibble</span><span class='o'>(</span> artist <span class='o'>=</span> <span class='s'>"Hamilton, Richard"</span>, medium <span class='o'>=</span> <span class='s'>"Letterpress on paper"</span> <span class='o'>)</span> <span class='nf'>bake</span><span class='o'>(</span><span class='nv'>tate_rec_prepped</span>, <span class='nv'>new_painting</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in `bake()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> The following required columns are missing from `new_data`: "id", "title".</span> <span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> These columns have one of the following roles, which are required at `bake()` time: "id".</span> <span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> If these roles are not required at `bake()` time, use `update_role_requirements(role = "your_role", bake = FALSE)`.</span></code></pre> </div> <p>It complains because the recipe is expecting the <code>id</code> and <code>title</code> variables to be in the data set passed to <code>bake()</code>. We can use <a href="https://recipes.tidymodels.org/reference/update_role_requirements.html" target="_blank" rel="noopener">update_role_requirements()</a> to tell the recipe that variables of role <code>&quot;id&quot;</code> are not required when baking and we are good to go!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>tate_rec</span> <span class='o'>&lt;-</span> <span class='nf'>recipe</span><span class='o'>(</span><span class='nv'>year</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>update_role</span><span class='o'>(</span><span class='nv'>id</span>, <span class='nv'>title</span>, new_role <span class='o'>=</span> <span class='s'>"id"</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>update_role_requirements</span><span class='o'>(</span>role <span class='o'>=</span> <span class='s'>"id"</span>, bake <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='o'>%&gt;%</span> <span class='nf'>step_dummy_extract</span><span class='o'>(</span><span class='nv'>artist</span>, <span class='nv'>medium</span>, sep <span class='o'>=</span> <span class='s'>", "</span><span class='o'>)</span> <span class='nv'>tate_rec_prepped</span> <span class='o'>&lt;-</span> <span class='nf'>prep</span><span class='o'>(</span><span class='nv'>tate_rec</span><span class='o'>)</span> <span class='nf'>bake</span><span class='o'>(</span><span class='nv'>tate_rec_prepped</span>, <span class='nv'>new_painting</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 2,675</span></span> <span class='c'>#&gt; artist_Abigail artist_Abraham artist_Absalon artist_Abts artist_Achill</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 0 0 0 0 0</span> <span class='c'>#&gt; <span style='color: #555555;'># … with 2,670 more variables: artist_Ackroyd &lt;dbl&gt;, artist_Adam &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Agnes &lt;dbl&gt;, artist_Ahtila &lt;dbl&gt;, artist_Ai &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Akram &lt;dbl&gt;, artist_Aksel &lt;dbl&gt;, artist_Al &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Al.Ani &lt;dbl&gt;, artist_Alan &lt;dbl&gt;, artist_Albert &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Aleksandra &lt;dbl&gt;, artist_Alex &lt;dbl&gt;, artist_Alexander &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Alexandre.da &lt;dbl&gt;, artist_Alfredo &lt;dbl&gt;, artist_Alice &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># artist_Alimpiev &lt;dbl&gt;, artist_Alison &lt;dbl&gt;, artist_Allen &lt;dbl&gt;, …</span></span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><ul> <li> <p>applicable <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/marlycormar" target="_blank" rel="noopener">@marlycormar</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>baguette: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>bonsai: <a href="https://github.com/bwilkowski" target="_blank" rel="noopener">@bwilkowski</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/pinogl" target="_blank" rel="noopener">@pinogl</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>broom: <a href="https://github.com/behrman" target="_blank" rel="noopener">@behrman</a>, <a href="https://github.com/corybrunson" target="_blank" rel="noopener">@corybrunson</a>, <a href="https://github.com/fschaffner" target="_blank" rel="noopener">@fschaffner</a>, <a href="https://github.com/gjones1219" target="_blank" rel="noopener">@gjones1219</a>, <a href="https://github.com/grantmcdermott" target="_blank" rel="noopener">@grantmcdermott</a>, <a href="https://github.com/mfansler" target="_blank" rel="noopener">@mfansler</a>, <a href="https://github.com/michaeltopper1" target="_blank" rel="noopener">@michaeltopper1</a>, <a href="https://github.com/ray-p144" target="_blank" rel="noopener">@ray-p144</a>, <a href="https://github.com/RichardJActon" target="_blank" rel="noopener">@RichardJActon</a>, <a href="https://github.com/russHyde" target="_blank" rel="noopener">@russHyde</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/tappek" target="_blank" rel="noopener">@tappek</a>, <a href="https://github.com/Timelessprod" target="_blank" rel="noopener">@Timelessprod</a>, and <a href="https://github.com/vincentarelbundock" target="_blank" rel="noopener">@vincentarelbundock</a>.</p> </li> <li> <p>butcher: <a href="https://github.com/cregouby" target="_blank" rel="noopener">@cregouby</a>, <a href="https://github.com/davidkane9" target="_blank" rel="noopener">@davidkane9</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</p> </li> <li> <p>censored: <a href="https://github.com/bcjaeger" target="_blank" rel="noopener">@bcjaeger</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/erikvona" target="_blank" rel="noopener">@erikvona</a>, <a href="https://github.com/gvelasq" target="_blank" rel="noopener">@gvelasq</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>corrr: <a href="https://github.com/astamm" target="_blank" rel="noopener">@astamm</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/john-s-f" target="_blank" rel="noopener">@john-s-f</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/thisisdaryn" target="_blank" rel="noopener">@thisisdaryn</a>.</p> </li> <li> <p>dials: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/franzbischoff" target="_blank" rel="noopener">@franzbischoff</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>discrim: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmarshallnz" target="_blank" rel="noopener">@jmarshallnz</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mkhansa" target="_blank" rel="noopener">@mkhansa</a>, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>hardhat: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/mdancho84" target="_blank" rel="noopener">@mdancho84</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>infer: <a href="https://github.com/gdbassett" target="_blank" rel="noopener">@gdbassett</a>, <a href="https://github.com/liubao210" target="_blank" rel="noopener">@liubao210</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</p> </li> <li> <p>modeldata: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jbkunst" target="_blank" rel="noopener">@jbkunst</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>multilevelmod: <a href="https://github.com/a-difabio" target="_blank" rel="noopener">@a-difabio</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/sitendug" target="_blank" rel="noopener">@sitendug</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/YiweiZhu" target="_blank" rel="noopener">@YiweiZhu</a>.</p> </li> <li> <p>parsnip: <a href="https://github.com/bappa10085" target="_blank" rel="noopener">@bappa10085</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/cb12991" target="_blank" rel="noopener">@cb12991</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/emmamendelsohn" target="_blank" rel="noopener">@emmamendelsohn</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/fdeoliveirag" target="_blank" rel="noopener">@fdeoliveirag</a>, <a href="https://github.com/gundalav" target="_blank" rel="noopener">@gundalav</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmarshallnz" target="_blank" rel="noopener">@jmarshallnz</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/Npaffen" target="_blank" rel="noopener">@Npaffen</a>, <a href="https://github.com/oj713" target="_blank" rel="noopener">@oj713</a>, <a href="https://github.com/pmags" target="_blank" rel="noopener">@pmags</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/shosaco" target="_blank" rel="noopener">@shosaco</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/tolliam" target="_blank" rel="noopener">@tolliam</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>plsmod: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</p> </li> <li> <p>poissonreg: <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>recipes: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>, <a href="https://github.com/albertiniufu" target="_blank" rel="noopener">@albertiniufu</a>, <a href="https://github.com/AndrewKostandy" target="_blank" rel="noopener">@AndrewKostandy</a>, <a href="https://github.com/aridf" target="_blank" rel="noopener">@aridf</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/cb12991" target="_blank" rel="noopener">@cb12991</a>, <a href="https://github.com/conorjudge" target="_blank" rel="noopener">@conorjudge</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/duccioa" target="_blank" rel="noopener">@duccioa</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/gundalav" target="_blank" rel="noopener">@gundalav</a>, <a href="https://github.com/hsbadr" target="_blank" rel="noopener">@hsbadr</a>, <a href="https://github.com/jkennel" target="_blank" rel="noopener">@jkennel</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/joranE" target="_blank" rel="noopener">@joranE</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/kendonB" target="_blank" rel="noopener">@kendonB</a>, <a href="https://github.com/krzjoa" target="_blank" rel="noopener">@krzjoa</a>, <a href="https://github.com/madprogramer" target="_blank" rel="noopener">@madprogramer</a>, <a href="https://github.com/mdporter" target="_blank" rel="noopener">@mdporter</a>, <a href="https://github.com/mdsteiner" target="_blank" rel="noopener">@mdsteiner</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/r2evans" target="_blank" rel="noopener">@r2evans</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/szymonkusak" target="_blank" rel="noopener">@szymonkusak</a>, <a href="https://github.com/themichjam" target="_blank" rel="noopener">@themichjam</a>, <a href="https://github.com/tmastny" target="_blank" rel="noopener">@tmastny</a>, <a href="https://github.com/tomazweiss" target="_blank" rel="noopener">@tomazweiss</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/TylerGrantSmith" target="_blank" rel="noopener">@TylerGrantSmith</a>, and <a href="https://github.com/zenggyu" target="_blank" rel="noopener">@zenggyu</a>.</p> </li> <li> <p>rsample: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mdporter" target="_blank" rel="noopener">@mdporter</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/pgoodling-usgs" target="_blank" rel="noopener">@pgoodling-usgs</a>, <a href="https://github.com/sametsoekel" target="_blank" rel="noopener">@sametsoekel</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/wkdavis" target="_blank" rel="noopener">@wkdavis</a>.</p> </li> <li> <p>rules: <a href="https://github.com/DesmondChoy" target="_blank" rel="noopener">@DesmondChoy</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/wdkeyzer" target="_blank" rel="noopener">@wdkeyzer</a>.</p> </li> <li> <p>shinymodels: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>.</p> </li> <li> <p>spatialsample: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/MxNl" target="_blank" rel="noopener">@MxNl</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, and <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>.</p> </li> <li> <p>stacks: <a href="https://github.com/amcmahon17" target="_blank" rel="noopener">@amcmahon17</a>, <a href="https://github.com/domijan" target="_blank" rel="noopener">@domijan</a>, <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a>, <a href="https://github.com/mcavs" target="_blank" rel="noopener">@mcavs</a>, <a href="https://github.com/mvt-oviedo" target="_blank" rel="noopener">@mvt-oviedo</a>, <a href="https://github.com/osorensen" target="_blank" rel="noopener">@osorensen</a>, <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>, <a href="https://github.com/rcannood" target="_blank" rel="noopener">@rcannood</a>, <a href="https://github.com/Saarialho" target="_blank" rel="noopener">@Saarialho</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/williamshell" target="_blank" rel="noopener">@williamshell</a>.</p> </li> <li> <p>textrecipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/NLDataScientist" target="_blank" rel="noopener">@NLDataScientist</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, and <a href="https://github.com/raj-hubber" target="_blank" rel="noopener">@raj-hubber</a>.</p> </li> <li> <p>themis: <a href="https://github.com/coforfe" target="_blank" rel="noopener">@coforfe</a>, and <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>.</p> </li> <li> <p>tidymodels: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EngrStudent" target="_blank" rel="noopener">@EngrStudent</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/kcarnold" target="_blank" rel="noopener">@kcarnold</a>, <a href="https://github.com/scottlyden" target="_blank" rel="noopener">@scottlyden</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>tidyposterior: <a href="https://github.com/jmgirard" target="_blank" rel="noopener">@jmgirard</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/mone27" target="_blank" rel="noopener">@mone27</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>tidypredict: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>tune: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dax44" target="_blank" rel="noopener">@dax44</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/felxcon" target="_blank" rel="noopener">@felxcon</a>, <a href="https://github.com/franzbischoff" target="_blank" rel="noopener">@franzbischoff</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mdancho84" target="_blank" rel="noopener">@mdancho84</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/munoztd0" target="_blank" rel="noopener">@munoztd0</a>, <a href="https://github.com/nikhilpathiyil" target="_blank" rel="noopener">@nikhilpathiyil</a>, <a href="https://github.com/pgoodling-usgs" target="_blank" rel="noopener">@pgoodling-usgs</a>, <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>, <a href="https://github.com/qiushiyan" target="_blank" rel="noopener">@qiushiyan</a>, <a href="https://github.com/siegfried" target="_blank" rel="noopener">@siegfried</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/thegargiulian" target="_blank" rel="noopener">@thegargiulian</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/williamshell" target="_blank" rel="noopener">@williamshell</a>, and <a href="https://github.com/wtbxsjy" target="_blank" rel="noopener">@wtbxsjy</a>.</p> </li> <li> <p>usemodels: <a href="https://github.com/aloes2512" target="_blank" rel="noopener">@aloes2512</a>, <a href="https://github.com/amcmahon17" target="_blank" rel="noopener">@amcmahon17</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/larry77" target="_blank" rel="noopener">@larry77</a>.</p> </li> <li> <p>workflows: <a href="https://github.com/CarstenLange" target="_blank" rel="noopener">@CarstenLange</a>, <a href="https://github.com/dajmcdon" target="_blank" rel="noopener">@dajmcdon</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/themichjam" target="_blank" rel="noopener">@themichjam</a>, and <a href="https://github.com/TylerGrantSmith" target="_blank" rel="noopener">@TylerGrantSmith</a>.</p> </li> <li> <p>workflowsets: <a href="https://github.com/a-difabio" target="_blank" rel="noopener">@a-difabio</a>, <a href="https://github.com/BorisDelange" target="_blank" rel="noopener">@BorisDelange</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/wdefreitas" target="_blank" rel="noopener">@wdefreitas</a>, and <a href="https://github.com/yonicd" target="_blank" rel="noopener">@yonicd</a>.</p> </li> <li> <p>yardstick: <a href="https://github.com/1lliter8" target="_blank" rel="noopener">@1lliter8</a>, <a href="https://github.com/amcmahon17" target="_blank" rel="noopener">@amcmahon17</a>, <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/gsverhoeven" target="_blank" rel="noopener">@gsverhoeven</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/parsifal9" target="_blank" rel="noopener">@parsifal9</a>, and <a href="https://github.com/sametsoekel" target="_blank" rel="noopener">@sametsoekel</a>.</p> </li> </ul> lintr 3.0.0 https://www.tidyverse.org/blog/2022/07/lintr-3-0-0/ Fri, 15 Jul 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/07/lintr-3-0-0/ <p>We are very excited to announce the release of <a href="https://lintr.r-lib.org" target="_blank" rel="noopener">lintr</a> 3.0.0! lintr is maintained by <a href="https://github.com/jimhester" target="_blank" rel="noopener">Jim Hester</a> and contributors, including three new package authors: <a href="https://github.com/AshesITR" target="_blank" rel="noopener">Alexander Rosenstock</a>, <a href="https://github.com/renkun-ken" target="_blank" rel="noopener">Kun Ren</a>, and <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">Michael Chirico</a>. lintr provides both a framework for <a href="https://www.perforce.com/blog/sca/what-static-analysis" target="_blank" rel="noopener">static analysis</a> of R packages and scripts and a variety of linters, e.g. to enforce the <a href="https://style.tidyverse.org/" target="_blank" rel="noopener">tidyverse style guide</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;lintr&#34;</span><span class="p">)</span> </code></pre></div><p>Check our vignettes for a quick introduction to the package:</p> <ul> <li>Getting started (<code>vignette(&quot;lintr&quot;)</code>)</li> <li>Integrating lintr with your preferred IDE (<code>vignette(&quot;editors&quot;)</code>)</li> <li>Integrating lintr with your preferred CI tools (<code>vignette(&quot;continuous-integration&quot;)</code>)</li> </ul> <p>We&rsquo;ve also added <code>lintr::use_lintr()</code> for a usethis-inspired interactive tool to configure lintr for your package/repo.</p> <p>This blog post will highlight the biggest changes coming in this update which drove us to declare it a major release.</p> <h1 id="selective-exclusions">Selective exclusions <a href="#selective-exclusions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>lintr now supports targeted exclusions of specific linters through an extension of the <code># nolint</code> syntax.</p> <p>Consider the following example:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">T_and_F_symbol_linter</span><span class="o">=</span><span class="nf">function</span><span class="p">(){</span> <span class="nf">list</span><span class="p">()</span> <span class="p">}</span> </code></pre></div><p>This snippet generates 5 lints:</p> <ol> <li><code>object_name_linter()</code> because the uppercase <code>T</code> and <code>F</code> in the function name do not match <code>lower_snake_case</code>.</li> <li><code>brace_linter()</code> because <code>{</code> should be separated from <code>)</code> by a space.</li> <li><code>paren_body_linter()</code> because <code>)</code> should be separated from the function body (starting at <code>{</code>) by a space.</li> <li><code>infix_spaces_linter()</code> because <code>=</code> should be surrounded by spaces on both sides.</li> <li><code>assignment_linter()</code> because <code>&lt;-</code> should be used for assignment.</li> </ol> <p>The first lint is spurious because <code>t</code> and <code>f</code> do not correctly convey that this linter targets the symbols <code>T</code> and <code>F</code>, so we want to ignore it. Prior to this release, we would have to throw the baby out with the bathwater by suppressing <em>all five lints</em> like so:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">T_and_F_symbol_linter</span><span class="o">=</span><span class="nf">function</span><span class="p">(){</span> <span class="c1"># nolint. T and F are OK here.</span> <span class="nf">list</span><span class="p">()</span> <span class="p">}</span> </code></pre></div><p>This hides the other four lints and prevents any new lints from being detected on this line in the future, which on average allows the overall quality of your projects/scripts to dip.</p> <p>With the new feature, you&rsquo;d write the exclusion like this instead:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">T_and_F_symbol_linter</span><span class="o">=</span><span class="nf">function</span><span class="p">(){</span> <span class="c1"># nolint: object_name_linter. T and F are OK here.</span> <span class="nf">list</span><span class="p">()</span> <span class="p">}</span> </code></pre></div><p>By qualifying the exclusion, the other 4 lints will be detected and exposed by <code>lint()</code> so that you can fix them! See <code>?exclude</code> for more details.</p> <h1 id="linter-factories">Linter factories <a href="#linter-factories"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>As of lintr 3.0.0, <em>all</em> linters must be <a href="https://adv-r.hadley.nz/function-factories.html" target="_blank" rel="noopener">function factories</a>.</p> <p>Previously, only parameterizable linters (such as <code>line_length_linter()</code>, which takes a parameter controlling how wide lines are allowed to be without triggering a lint) were factories, but this led to some problems:</p> <ol> <li>Inconsistency&mdash;some linters were designated as calls, like <code>line_length_linter(120)</code>, while others were designated as names, like <code>no_tab_linter</code>.</li> <li>Brittleness&mdash;some linters evolve to gain (or lose) parameters over time, e.g. in this release <code>assignment_linter</code> gained two arguments, <code>allow_cascading_assign</code> and <code>allow_right_assign</code>, to fine-tune the handling of the cascading assignment operators <code>&lt;&lt;-</code>/<code>-&gt;&gt;</code> and right assignment operators <code>-&gt;</code>/<code>-&gt;&gt;</code>, respectively.</li> <li>Performance&mdash;factories can run some fixed computations at declaration and store them in the function environment, whereas previously the calculation would need to be repeated on every expression of every file being linted.</li> </ol> <p>This has two significant practical implications and are the main reason this is a major release.</p> <p>First, lintr invocations should always use the call form, so old usages like:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">lint_package</span><span class="p">(</span><span class="n">linters</span> <span class="o">=</span> <span class="n">assignment_linter</span><span class="p">)</span> </code></pre></div><p>should be replaced with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">lint_package</span><span class="p">(</span><span class="n">linters</span> <span class="o">=</span> <span class="nf">assignment_linter</span><span class="p">())</span> </code></pre></div><p>We expect this to show up in most cases through users&rsquo; <code>.lintr</code> configuration files.</p> <p>Second, users implementing custom linters need to convert to function factories.</p> <p>That means replacing:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">my_custom_linter</span> <span class="o">&lt;-</span> <span class="nf">function</span><span class="p">(</span><span class="n">source_expression</span><span class="p">)</span> <span class="p">{</span> <span class="kc">...</span> <span class="p">}</span> </code></pre></div><p>With:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">my_custom_linter</span> <span class="o">&lt;-</span> <span class="nf">function</span><span class="p">()</span> <span class="nf">Linter</span><span class="p">(</span><span class="nf">function</span><span class="p">(</span><span class="n">source_expression</span><span class="p">)</span> <span class="p">{</span> <span class="kc">...</span> <span class="p">}))</span> </code></pre></div><p><code>Linter()</code> is a wrapper to construct the <code>linter</code> S3 class.</p> <h1 id="linter-metadatabase-linter-documentation-and-pkgdown">Linter metadatabase, linter documentation, and pkgdown <a href="#linter-metadatabase-linter-documentation-and-pkgdown"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>We have also overhauled how linters are documented. Previously, all linters were documented on a single page and described in a quick blurb. This has gotten unwieldy as lintr has grown to export 72 linters! Now, each linter gets its own page, which will make it easier to document any parameters, enumerate edge cases/ known false positives, add links to external resources, etc.</p> <p>To make linter discovery even more navigable, we&rsquo;ve also added <code>available_linters()</code>, a database with known linters and some associated metadata tags for each. For example, <code>brace_linter</code> has tags <code>style</code>, <code>readability</code>, <code>default</code>, and <code>configurable</code>. Each tag also gets its own documentation page (e.g. <code>?readability_linters</code>) which describes the tag and lists all of the known associated linters. The tags are available in another database: <code>available_tags()</code>. These databases can be extended to include custom linters in your package; see <code>?available_linters</code>.</p> <p>Moreover, lintr&rsquo;s documentation is now available as a website thanks to Hadley Wickham&rsquo;s contribution to create a pkgdown website for the package: <a href="https://lintr.r-lib.org" target="_blank" rel="noopener">lintr.r-lib.org</a>.</p> <h1 id="google-linters">Google linters <a href="#google-linters"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>This release also features more than 30 new linters originally authored by Google developers. Google adheres mostly to the tidyverse style guide and uses lintr to improve the quality of its considerable internal R code base. These linters detect common issues with readability, consistency, and performance. Here are some examples:</p> <ul> <li><code>any_is_na_linter()</code> detects the usage of <code>any(is.na(x))</code>; <code>anyNA(x)</code> is nearly always a better choice, both for performance and for readability.</li> <li><code>expect_named_linter()</code> detects usage in <a href="http://testthat.r-lib.org/" target="_blank" rel="noopener">testthat</a> suites like <code>expect_equal(names(x), c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;))</code>; <code>testthat</code> also exports <code>expect_named()</code> which is tailor made to make more readable tests like <code>expect_named(x, c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;))</code>.</li> <li><code>vector_logic_linter()</code> detects usage of vector logic operators <code>|</code> and <code>&amp;</code> in situations where scalar logic applies, e.g. <code>if (x | y) { ... }</code> should be <code>if (x || y) { ... }</code>. The latter is more efficient and less error-prone.</li> <li><code>strings_as_factors_linter()</code> helps developers maintaining code that straddles the R 4.0.0 boundary, where the default value of <code>stringsAsFactors</code> <a href="https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/" target="_blank" rel="noopener">changed from <code>TRUE</code> to <code>FALSE</code></a>, by identifying usages of <code>data.frame()</code> that (1) have known string columns and (2) don&rsquo;t declare a value for <code>stringsAsFactors</code>, and thus rely on the R version-dependent default.</li> </ul> <p>See the <a href="https://lintr.r-lib.org/news/index.html#google-linters-3-0-0" target="_blank" rel="noopener">NEWS</a> for the complete list.</p> <h1 id="other-improvements">Other improvements <a href="#other-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>This is a big release&mdash;almost 2 years in the making&mdash;and includes a plethora of smaller but nonetheless important changes to lintr. Please check the <a href="https://lintr.r-lib.org/news/index.html#lintr-300" target="_blank" rel="noopener">NEWS</a> for a complete enumeration of these. Here are a few more new linters as a highlight:</p> <ul> <li><code>sprintf_linter()</code>: a new linter for detecting potentially problematic calls to <code>sprintf()</code> (e.g. using too many or too few arguments as compared to the number of template fields).</li> <li><code>package_hooks_linter()</code>: a new linter to check consistency of <code>.onLoad()</code> functions and other namespace hooks, as required by <code>R CMD check</code>.</li> <li><code>namespace_linter()</code>: a new linter to check for common mistakes in <code>pkg::symbol</code> usage, e.g. if <code>symbol</code> is not an exported object from <code>pkg</code>.</li> </ul> <p>Google has developed and tested many more broad-purpose linters that it plans to share, e.g. for detecting <code>length(which(x == y)) &gt; 0</code> (i.e., <code>any(x == y)</code>), <code>lapply(x, function(xi) sum(xi))</code> (i.e., <code>lapply(x, sum)</code>), <code>c(&quot;key_name&quot; = &quot;value_name&quot;)</code> (i.e., <code>c(key_name = &quot;value_name&quot;)</code>), and more! Follow <a href="https://github.com/r-lib/lintr/issues/884" target="_blank" rel="noopener">#884</a> for updates.</p> <p>Moreover, with the decision to accept a bevy of linters from Google that are not strictly related to the tidyverse style guide, we also opened the door to hosting linters for enforcing other style guides, for example the <a href="https://contributions.bioconductor.org/r-code.html" target="_blank" rel="noopener">Bioconductor R code guide</a>. We look forward to community contributions in this vein.</p> <h1 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h1><p>A great big thanks to the 97 people who have contributed to this release of lintr:</p> <p> <a href="https://github.com/1beb" target="_blank" rel="noopener">@1beb</a>, <a href="https://github.com/albert-ying" target="_blank" rel="noopener">@albert-ying</a>, <a href="https://github.com/aronatkins" target="_blank" rel="noopener">@aronatkins</a>, <a href="https://github.com/AshesITR" target="_blank" rel="noopener">@AshesITR</a>, <a href="https://github.com/assignUser" target="_blank" rel="noopener">@assignUser</a>, <a href="https://github.com/barryrowlingson" target="_blank" rel="noopener">@barryrowlingson</a>, <a href="https://github.com/belokoch" target="_blank" rel="noopener">@belokoch</a>, <a href="https://github.com/bersbersbers" target="_blank" rel="noopener">@bersbersbers</a>, <a href="https://github.com/bsolomon1124" target="_blank" rel="noopener">@bsolomon1124</a>, <a href="https://github.com/chrisumphlett" target="_blank" rel="noopener">@chrisumphlett</a>, <a href="https://github.com/csgillespie" target="_blank" rel="noopener">@csgillespie</a>, <a href="https://github.com/danielinteractive" target="_blank" rel="noopener">@danielinteractive</a>, <a href="https://github.com/dankessler" target="_blank" rel="noopener">@dankessler</a>, <a href="https://github.com/dgkf" target="_blank" rel="noopener">@dgkf</a>, <a href="https://github.com/dinakar29" target="_blank" rel="noopener">@dinakar29</a>, <a href="https://github.com/dmurdoch" target="_blank" rel="noopener">@dmurdoch</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/dragosmg" target="_blank" rel="noopener">@dragosmg</a>, <a href="https://github.com/dschlaep" target="_blank" rel="noopener">@dschlaep</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/ElsLommelen" target="_blank" rel="noopener">@ElsLommelen</a>, <a href="https://github.com/f-ritter" target="_blank" rel="noopener">@f-ritter</a>, <a href="https://github.com/fabian-s" target="_blank" rel="noopener">@fabian-s</a>, <a href="https://github.com/fdlk" target="_blank" rel="noopener">@fdlk</a>, <a href="https://github.com/fornaeffe" target="_blank" rel="noopener">@fornaeffe</a>, <a href="https://github.com/frederic-mahe" target="_blank" rel="noopener">@frederic-mahe</a>, <a href="https://github.com/GiuseppeTT" target="_blank" rel="noopener">@GiuseppeTT</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hhoeflin" target="_blank" rel="noopener">@hhoeflin</a>, <a href="https://github.com/hrvg" target="_blank" rel="noopener">@hrvg</a>, <a href="https://github.com/huisman" target="_blank" rel="noopener">@huisman</a>, <a href="https://github.com/iago-pssjd" target="_blank" rel="noopener">@iago-pssjd</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/inventionate" target="_blank" rel="noopener">@inventionate</a>, <a href="https://github.com/ishaar226" target="_blank" rel="noopener">@ishaar226</a>, <a href="https://github.com/jabenninghoff" target="_blank" rel="noopener">@jabenninghoff</a>, <a href="https://github.com/jameslamb" target="_blank" rel="noopener">@jameslamb</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jeremymiles" target="_blank" rel="noopener">@jeremymiles</a>, <a href="https://github.com/jhgoebbert" target="_blank" rel="noopener">@jhgoebbert</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/johanneswerner" target="_blank" rel="noopener">@johanneswerner</a>, <a href="https://github.com/jonkeane" target="_blank" rel="noopener">@jonkeane</a>, <a href="https://github.com/JSchoenbachler" target="_blank" rel="noopener">@JSchoenbachler</a>, <a href="https://github.com/JWiley" target="_blank" rel="noopener">@JWiley</a>, <a href="https://github.com/karlvurdst" target="_blank" rel="noopener">@karlvurdst</a>, <a href="https://github.com/klmr" target="_blank" rel="noopener">@klmr</a>, <a href="https://github.com/Kotsakis" target="_blank" rel="noopener">@Kotsakis</a>, <a href="https://github.com/kpagacz" target="_blank" rel="noopener">@kpagacz</a>, <a href="https://github.com/kpj" target="_blank" rel="noopener">@kpj</a>, <a href="https://github.com/latot" target="_blank" rel="noopener">@latot</a>, <a href="https://github.com/leogama" target="_blank" rel="noopener">@leogama</a>, <a href="https://github.com/liar666" target="_blank" rel="noopener">@liar666</a>, <a href="https://github.com/logstar" target="_blank" rel="noopener">@logstar</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/markromanmiller" target="_blank" rel="noopener">@markromanmiller</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/maxheld83" target="_blank" rel="noopener">@maxheld83</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/michaelquinn32" target="_blank" rel="noopener">@michaelquinn32</a>, <a href="https://github.com/mikekaminsky" target="_blank" rel="noopener">@mikekaminsky</a>, <a href="https://github.com/milanglacier" target="_blank" rel="noopener">@milanglacier</a>, <a href="https://github.com/minimenchmuncher" target="_blank" rel="noopener">@minimenchmuncher</a>, <a href="https://github.com/mjsteinbaugh" target="_blank" rel="noopener">@mjsteinbaugh</a>, <a href="https://github.com/nathaneastwood" target="_blank" rel="noopener">@nathaneastwood</a>, <a href="https://github.com/nlarusstone" target="_blank" rel="noopener">@nlarusstone</a>, <a href="https://github.com/nsoranzo" target="_blank" rel="noopener">@nsoranzo</a>, <a href="https://github.com/nvuillam" target="_blank" rel="noopener">@nvuillam</a>, <a href="https://github.com/pakjiddat" target="_blank" rel="noopener">@pakjiddat</a>, <a href="https://github.com/pat-s" target="_blank" rel="noopener">@pat-s</a>, <a href="https://github.com/prncevince" target="_blank" rel="noopener">@prncevince</a>, <a href="https://github.com/QiStats-Joel" target="_blank" rel="noopener">@QiStats-Joel</a>, <a href="https://github.com/rahulrachh" target="_blank" rel="noopener">@rahulrachh</a>, <a href="https://github.com/razz-matazz" target="_blank" rel="noopener">@razz-matazz</a>, <a href="https://github.com/renkun-ken" target="_blank" rel="noopener">@renkun-ken</a>, <a href="https://github.com/rfalke" target="_blank" rel="noopener">@rfalke</a>, <a href="https://github.com/richfitz" target="_blank" rel="noopener">@richfitz</a>, <a href="https://github.com/russHyde" target="_blank" rel="noopener">@russHyde</a>, <a href="https://github.com/salim-b" target="_blank" rel="noopener">@salim-b</a>, <a href="https://github.com/schaffstein" target="_blank" rel="noopener">@schaffstein</a>, <a href="https://github.com/scottmmjackson" target="_blank" rel="noopener">@scottmmjackson</a>, <a href="https://github.com/sgvignali" target="_blank" rel="noopener">@sgvignali</a>, <a href="https://github.com/shaopeng-gh" target="_blank" rel="noopener">@shaopeng-gh</a>, <a href="https://github.com/StefanBRas" target="_blank" rel="noopener">@StefanBRas</a>, <a href="https://github.com/stefaneng" target="_blank" rel="noopener">@stefaneng</a>, <a href="https://github.com/stefanocoretta" target="_blank" rel="noopener">@stefanocoretta</a>, <a href="https://github.com/stufield" target="_blank" rel="noopener">@stufield</a>, <a href="https://github.com/TCABJ" target="_blank" rel="noopener">@TCABJ</a>, <a href="https://github.com/telegott" target="_blank" rel="noopener">@telegott</a>, <a href="https://github.com/ThierryO" target="_blank" rel="noopener">@ThierryO</a>, <a href="https://github.com/thisisnic" target="_blank" rel="noopener">@thisisnic</a>, <a href="https://github.com/tonyk7440" target="_blank" rel="noopener">@tonyk7440</a>, <a href="https://github.com/wfmueller29" target="_blank" rel="noopener">@wfmueller29</a>, <a href="https://github.com/wibeasley" target="_blank" rel="noopener">@wibeasley</a>, <a href="https://github.com/yannickwurm" target="_blank" rel="noopener">@yannickwurm</a>, and <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>.</p> bonsai 0.1.0 https://www.tidyverse.org/blog/2022/06/bonsai-0-1-0/ Thu, 30 Jun 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/06/bonsai-0-1-0/ <p>We&rsquo;re super stoked to announce the first release of the <a href="https://bonsai.tidymodels.org/" target="_blank" rel="noopener">bonsai</a> package on CRAN! bonsai is a <a href="https://parsnip.tidymodels.org/" target="_blank" rel="noopener">parsnip</a> extension package for tree-based models.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"bonsai"</span><span class='o'>)</span></code></pre> </div> <p>Without extension packages, the parsnip package already supports fitting decision trees, random forests, and boosted trees. The bonsai package introduces support for two additional engines that implement variants of these algorithms:</p> <ul> <li> <a href="https://CRAN.R-project.org/package=partykit" target="_blank" rel="noopener">partykit</a>: conditional inference trees via <a href="https://parsnip.tidymodels.org/reference/decision_tree.html" target="_blank" rel="noopener"><code>decision_tree()</code></a> and conditional random forests via <a href="https://parsnip.tidymodels.org/reference/rand_forest.html" target="_blank" rel="noopener"><code>rand_forest()</code></a></li> <li> <a href="https://CRAN.R-project.org/package=lightgbm" target="_blank" rel="noopener">LightGBM</a>: optimized gradient boosted trees via <a href="https://parsnip.tidymodels.org/reference/boost_tree.html" target="_blank" rel="noopener"><code>boost_tree()</code></a></li> </ul> <p>As we introduce further support for tree-based model engines in the tidymodels, new implementations will reside in this package (rather than parsnip).</p> <p>To demonstrate how to use the package, we&rsquo;ll fit a few tree-based models and explore their output. First, loading bonsai as well as the rest of the tidymodels core packages:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://bonsai.tidymodels.org/'>bonsai</a></span><span class='o'>)</span> <span class='c'>#&gt; Loading required package: parsnip</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidymodels.tidymodels.org'>tidymodels</a></span><span class='o'>)</span> <span class='c'>#&gt; ── <span style='font-weight: bold;'>Attaching packages</span> ────────────────────────────────────── tidymodels 0.2.0 ──</span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>broom </span> 0.8.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>rsample </span> 0.1.1 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dials </span> 1.0.0 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tibble </span> 3.1.7 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>dplyr </span> 1.0.9 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tidyr </span> 1.2.0 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>ggplot2 </span> 3.3.6 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>tune </span> 0.2.0 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>infer </span> 1.0.2 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflows </span> 0.2.6 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>modeldata </span> 0.1.1.<span style='color: #BB0000;'>9000</span> <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>workflowsets</span> 0.2.1 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>purrr </span> 0.3.4 <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>yardstick </span> 1.0.0 </span> <span class='c'>#&gt; <span style='color: #00BB00;'>✔</span> <span style='color: #0000BB;'>recipes </span> 0.2.0</span> <span class='c'>#&gt; ── <span style='font-weight: bold;'>Conflicts</span> ───────────────────────────────────────── tidymodels_conflicts() ──</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>purrr</span>::<span style='color: #00BB00;'>discard()</span> masks <span style='color: #0000BB;'>scales</span>::discard()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>filter()</span> masks <span style='color: #0000BB;'>stats</span>::filter()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>dplyr</span>::<span style='color: #00BB00;'>lag()</span> masks <span style='color: #0000BB;'>stats</span>::lag()</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> <span style='color: #0000BB;'>recipes</span>::<span style='color: #00BB00;'>step()</span> masks <span style='color: #0000BB;'>stats</span>::step()</span> <span class='c'>#&gt; <span style='color: #0000BB;'>•</span> Dig deeper into tidy modeling with R at <span style='color: #00BB00;'>https://www.tmwr.org</span></span></code></pre> </div> <p>Note that we use a development version of the <a href="https://modeldata.tidymodels.org/" target="_blank" rel="noopener">modeldata</a> package to generate example data later on in this post using the new <code>sim_regression()</code> function&mdash;you can install this version of the package using <code>pak::pak(tidymodels/modeldata)</code>.</p> <p>We&rsquo;ll use a <a href="https://allisonhorst.github.io/palmerpenguins/" target="_blank" rel="noopener">dataset</a> containing measurements on 3 different species of penguins as an example. Loading that data in and checking it out:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>penguins</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/str.html'>str</a></span><span class='o'>(</span><span class='nv'>penguins</span><span class='o'>)</span> <span class='c'>#&gt; tibble [344 × 7] (S3: tbl_df/tbl/data.frame)</span> <span class='c'>#&gt; $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...</span> <span class='c'>#&gt; $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...</span> <span class='c'>#&gt; $ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...</span> <span class='c'>#&gt; $ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...</span> <span class='c'>#&gt; $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...</span> <span class='c'>#&gt; $ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...</span> <span class='c'>#&gt; $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...</span></code></pre> </div> <p>Specifically, we&rsquo;ll make use of flipper length and home island to model a penguin&rsquo;s species:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>ggplot</span><span class='o'>(</span><span class='nv'>penguins</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>aes</span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>island</span>, y <span class='o'>=</span> <span class='nv'>flipper_length_mm</span>, col <span class='o'>=</span> <span class='nv'>species</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_jitter</span><span class='o'>(</span>width <span class='o'>=</span> <span class='m'>.2</span><span class='o'>)</span> </code></pre> <p><img src="figs/penguin-plot-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Looking at this plot, you might begin to imagine your own simple set of binary splits for guessing which species a penguin might be given its home island and flipper length. Given that this small set of predictors almost completely separates our outcome with only a few splits, a relatively simple tree should serve our purposes just fine.</p> <h2 id="decision-trees">Decision Trees <a href="#decision-trees"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>bonsai introduces support for fitting decision trees with partykit, which implements a variety of decision trees called conditional inference trees (CITs).</p> <p>CITs differ from implementations of decision trees available elsewhere in the tidymodels in the criteria used to generate splits. The details of how these criteria differ are outside of the scope of this post.<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup> Practically, though, CITs offer a few notable advantages over CART- and C5.0-based decision trees:</p> <ul> <li><strong>Overfitting</strong>: Common implementations of decision trees are notoriously prone to overfitting, and require several well-chosen penalization (i.e. cost-complexity) and early stopping (e.g. pruning, max depth) hyperparameters to fit a model that will perform well when predicting on new observations. &ldquo;Out-of-the-box,&rdquo; CITs are not as prone to these same issues and do not accept a penalization parameter at all.</li> <li><strong>Selection bias</strong>: Common implementations of decision trees are biased towards selecting variables with many possible split points or missing values. CITs are natively not prone to the first issue, and many popular implementations address the second vulnerability.</li> </ul> <p>To define a conditional inference tree model specification, just set the modeling engine to <code>&quot;partykit&quot;</code> when creating a decision tree. Fitting to the penguins data, then:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>dt_mod</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/decision_tree.html'>decision_tree</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span>engine <span class='o'>=</span> <span class='s'>"partykit"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"classification"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span> formula <span class='o'>=</span> <span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>flipper_length_mm</span> <span class='o'>+</span> <span class='nv'>island</span>, data <span class='o'>=</span> <span class='nv'>penguins</span> <span class='o'>)</span> <span class='nv'>dt_mod</span> <span class='c'>#&gt; parsnip model object</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Model formula:</span> <span class='c'>#&gt; species ~ flipper_length_mm + island</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Fitted party:</span> <span class='c'>#&gt; [1] root</span> <span class='c'>#&gt; | [2] island in Biscoe</span> <span class='c'>#&gt; | | [3] flipper_length_mm &lt;= 203</span> <span class='c'>#&gt; | | | [4] flipper_length_mm &lt;= 196: Adelie (n = 38, err = 0.0%)</span> <span class='c'>#&gt; | | | [5] flipper_length_mm &gt; 196: Adelie (n = 8, err = 25.0%)</span> <span class='c'>#&gt; | | [6] flipper_length_mm &gt; 203: Gentoo (n = 122, err = 0.0%)</span> <span class='c'>#&gt; | [7] island in Dream, Torgersen</span> <span class='c'>#&gt; | | [8] island in Dream</span> <span class='c'>#&gt; | | | [9] flipper_length_mm &lt;= 192: Adelie (n = 59, err = 33.9%)</span> <span class='c'>#&gt; | | | [10] flipper_length_mm &gt; 192: Chinstrap (n = 65, err = 26.2%)</span> <span class='c'>#&gt; | | [11] island in Torgersen: Adelie (n = 52, err = 0.0%)</span> <span class='c'>#&gt; </span> <span class='c'>#&gt; Number of inner nodes: 5</span> <span class='c'>#&gt; Number of terminal nodes: 6</span></code></pre> </div> <p>Do any of these splits line up with your intuition? This tree results in only 6 terminal nodes and describes the structure shown in the above plot quite well.</p> <p>Read more about this implementation of decision trees in <a href="https://parsnip.tidymodels.org/reference/details_decision_tree_partykit.html" target="_blank" rel="noopener"><code>?details_decision_tree_partykit</code></a>.</p> <h2 id="random-forests">Random Forests <a href="#random-forests"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>One generalization of a decision tree is a <em>random forest</em>, which fits a large number of decision trees, each independently of the others. The fitted random forest model combines predictions from the individual decision trees to generate its predictions.</p> <p>bonsai introduces support for random forests using the <code>partykit</code> engine, which implements an algorithm called a <em>conditional random forest</em>. Conditional random forests are a type of random forest that uses conditional inference trees (like the one we fit above!) for its constituent decision trees.</p> <p>To fit a conditional random forest with partykit, our code looks pretty similar to that which we we needed to fit a conditional inference tree. Just switch out <a href="https://parsnip.tidymodels.org/reference/decision_tree.html" target="_blank" rel="noopener"><code>decision_tree()</code></a> with <a href="https://parsnip.tidymodels.org/reference/rand_forest.html" target="_blank" rel="noopener"><code>rand_forest()</code></a> and remember to keep the engine set as <code>&quot;partykit&quot;</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>rf_mod</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/rand_forest.html'>rand_forest</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span>engine <span class='o'>=</span> <span class='s'>"partykit"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"classification"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span> formula <span class='o'>=</span> <span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>flipper_length_mm</span> <span class='o'>+</span> <span class='nv'>island</span>, data <span class='o'>=</span> <span class='nv'>penguins</span> <span class='o'>)</span></code></pre> </div> <p>Read more about this implementation of random forests in <a href="https://parsnip.tidymodels.org/reference/details_rand_forest_partykit.html" target="_blank" rel="noopener"><code>?details_rand_forest_partykit</code></a>.</p> <h2 id="boosted-trees">Boosted Trees <a href="#boosted-trees"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another generalization of a decision tree is a series of decision trees where <em>each tree depends on the results of previous trees</em>&mdash;this is called a <em>boosted tree</em>. bonsai implements an additional parsnip engine for this model type called <code>&quot;lightgbm&quot;</code>. While fitting boosted trees is quite computationally intensive, especially with high-dimensional data, LightGBM provides an implementation of a highly efficient variant of the algorithm.</p> <p>To make use of it, start out with a <code>boost_tree</code> model spec and set <code>engine = &quot;lightgbm&quot;</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>bt_mod</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/boost_tree.html'>boost_tree</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span>engine <span class='o'>=</span> <span class='s'>"lightgbm"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"classification"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span> formula <span class='o'>=</span> <span class='nv'>species</span> <span class='o'>~</span> <span class='nv'>flipper_length_mm</span> <span class='o'>+</span> <span class='nv'>island</span>, data <span class='o'>=</span> <span class='nv'>penguins</span> <span class='o'>)</span></code></pre> </div> <p>The main benefit of using LightGBM is its computational efficiency: as the number of observations in training data increases, we can observe an increasingly substantial decrease in time-to-fit when using the LightGBM engine as compared to other implementations of boosted trees, like XGBoost.</p> <p>To show this, we&rsquo;ll use the <code>sim_regression()</code> function from modeldata to simulate increasingly large datasets that we can fit models to. For example, generating a dataset with 10 observations and 20 numeric predictors:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>sim_regression</span><span class='o'>(</span>num_samples <span class='o'>=</span> <span class='m'>10</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 10 × 21</span></span> <span class='c'>#&gt; outcome predictor_01 predictor_02 predictor_03 predictor_04 predictor_05</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'> 1</span> 41.9 -<span style='color: #BB0000;'>3.15</span> 3.72 -<span style='color: #BB0000;'>0.800</span> -<span style='color: #BB0000;'>5.87</span> 0.265</span> <span class='c'>#&gt; <span style='color: #555555;'> 2</span> 49.4 4.93 6.15 5.09 0.501 -<span style='color: #BB0000;'>2.45</span> </span> <span class='c'>#&gt; <span style='color: #555555;'> 3</span> -<span style='color: #BB0000;'>9.20</span> 0.020<span style='text-decoration: underline;'>0</span> -<span style='color: #BB0000;'>2.31</span> 4.64 0.422 3.14 </span> <span class='c'>#&gt; <span style='color: #555555;'> 4</span> -<span style='color: #BB0000;'>0.385</span> -<span style='color: #BB0000;'>1.97</span> -<span style='color: #BB0000;'>2.56</span> -<span style='color: #BB0000;'>0.018</span><span style='color: #BB0000; text-decoration: underline;'>2</span> 1.83 -<span style='color: #BB0000;'>4.23</span> </span> <span class='c'>#&gt; <span style='color: #555555;'> 5</span> 8.08 -<span style='color: #BB0000;'>0.266</span> -<span style='color: #BB0000;'>0.574</span> -<span style='color: #BB0000;'>1.08</span> -<span style='color: #BB0000;'>1.75</span> 1.57 </span> <span class='c'>#&gt; <span style='color: #555555;'> 6</span> 3.79 0.145 3.86 3.91 3.32 -<span style='color: #BB0000;'>4.27</span> </span> <span class='c'>#&gt; <span style='color: #555555;'> 7</span> 1.12 -<span style='color: #BB0000;'>6.35</span> -<span style='color: #BB0000;'>2.39</span> 0.119 0.848 1.74 </span> <span class='c'>#&gt; <span style='color: #555555;'> 8</span> 3.21 4.56 3.20 -<span style='color: #BB0000;'>2.68</span> -<span style='color: #BB0000;'>1.11</span> 0.729</span> <span class='c'>#&gt; <span style='color: #555555;'> 9</span> -<span style='color: #BB0000;'>4.56</span> 2.97 -<span style='color: #BB0000;'>1.36</span> -<span style='color: #BB0000;'>1.90</span> -<span style='color: #BB0000;'>1.01</span> 0.557</span> <span class='c'>#&gt; <span style='color: #555555;'>10</span> 0.140 -<span style='color: #BB0000;'>0.234</span> -<span style='color: #BB0000;'>1.05</span> 0.551 0.861 -<span style='color: #BB0000;'>0.937</span></span> <span class='c'>#&gt; <span style='color: #555555;'># … with 15 more variables: predictor_06 &lt;dbl&gt;, predictor_07 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># predictor_08 &lt;dbl&gt;, predictor_09 &lt;dbl&gt;, predictor_10 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># predictor_11 &lt;dbl&gt;, predictor_12 &lt;dbl&gt;, predictor_13 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># predictor_14 &lt;dbl&gt;, predictor_15 &lt;dbl&gt;, predictor_16 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># predictor_17 &lt;dbl&gt;, predictor_18 &lt;dbl&gt;, predictor_19 &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># predictor_20 &lt;dbl&gt;</span></span></code></pre> </div> <p>Now, fitting boosted trees on increasingly large datasets with XGBoost and LightGBM and observing time-to-fit:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># given an engine and nrow(training_data), return the time to fit</span> <span class='nv'>time_boost_fit</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>engine</span>, <span class='nv'>n</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nv'>time</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/system.time.html'>system.time</a></span><span class='o'>(</span><span class='o'>&#123;</span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/boost_tree.html'>boost_tree</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_engine.html'>set_engine</a></span><span class='o'>(</span>engine <span class='o'>=</span> <span class='nv'>engine</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://parsnip.tidymodels.org/reference/set_args.html'>set_mode</a></span><span class='o'>(</span>mode <span class='o'>=</span> <span class='s'>"regression"</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://generics.r-lib.org/reference/fit.html'>fit</a></span><span class='o'>(</span> formula <span class='o'>=</span> <span class='nv'>outcome</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nf'>sim_regression</span><span class='o'>(</span>num_samples <span class='o'>=</span> <span class='nv'>n</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'>&#125;</span><span class='o'>)</span> <span class='nf'>tibble</span><span class='o'>(</span> engine <span class='o'>=</span> <span class='nv'>engine</span>, n <span class='o'>=</span> <span class='nv'>n</span>, time_to_fit <span class='o'>=</span> <span class='nv'>time</span><span class='o'>[[</span><span class='s'>"elapsed"</span><span class='o'>]</span><span class='o'>]</span> <span class='o'>)</span> <span class='o'>&#125;</span> <span class='c'># setup engine and n_samples combinations</span> <span class='nv'>engines</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span>XGBoost <span class='o'>=</span> <span class='s'>"xgboost"</span>, LightGBM <span class='o'>=</span> <span class='s'>"lightgbm"</span><span class='o'>)</span>, each <span class='o'>=</span> <span class='m'>11</span><span class='o'>)</span> <span class='nv'>n_samples</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/Round.html'>round</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='m'>10</span> <span class='o'>*</span> <span class='m'>10</span><span class='o'>^</span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>4.5</span>, <span class='m'>.25</span><span class='o'>)</span><span class='o'>)</span>, times <span class='o'>=</span> <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='c'># apply the function over each combination</span> <span class='nv'>fit_times</span> <span class='o'>&lt;-</span> <span class='nf'>map2_dfr</span><span class='o'>(</span> <span class='nv'>engines</span>, <span class='nv'>n_samples</span>, <span class='nv'>time_boost_fit</span> <span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>mutate</span><span class='o'>(</span> engine <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nv'>engine</span>, levels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"xgboost"</span>, <span class='s'>"lightgbm"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'># visualize results</span> <span class='nf'>ggplot</span><span class='o'>(</span><span class='nv'>fit_times</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>aes</span><span class='o'>(</span>x <span class='o'>=</span> <span class='nv'>n</span>, y <span class='o'>=</span> <span class='nv'>time_to_fit</span>, col <span class='o'>=</span> <span class='nv'>engine</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>geom_line</span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'>scale_x_log10</span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/boost-comparison-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As we can see, the decrease in time-to-fit when using LightGBM as opposed to XGBoost becomes more notable as the number of rows in the training data increases.</p> <p>Read more about this implementation of boosted trees in <a href="https://parsnip.tidymodels.org/reference/details_boost_tree_lightgbm.html" target="_blank" rel="noopener"><code>?details_boost_tree_lightgbm</code></a>.</p> <h2 id="other-notes">Other Notes <a href="#other-notes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This package is based off of <a href="https://github.com/curso-r/treesnip" target="_blank" rel="noopener">the treesnip package</a> by Daniel Falbel, Athos Damiani, and Roel M. Hogervorst. Users of that package will note that we have not included support for <a href="https://github.com/catboost/catboost" target="_blank" rel="noopener">the catboost package</a>. Unfortunately, the catboost R package is not on CRAN, so we&rsquo;re not able to add support for the package for now. We&rsquo;ll be keeping an eye on discussions in that development community and plan to support the package upon its release to CRAN!</p> <p>Each of these model specs and engines have several arguments and tuning parameters that affect user experience and results greatly. We recommend reading about each of these parameters and tuning them when you find them relevant for your modeling use case.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to Daniel Falbel, Athos Damiani, and Roel M. Hogervorst for their work on <a href="https://github.com/curso-r/treesnip" target="_blank" rel="noopener">the treesnip package</a>, on which this package is based. We&rsquo;ve listed the treesnip authors as co-authors of bonsai in recognition of their help in laying the foundations for this project.</p> <p>We&rsquo;re also grateful for the wonderful package hex sticker by Amanda Petri!</p> <p>Finally, thank you to those who have tested and provided feedback on the developmental versions of the package over the last couple months.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>For those interested, the <a href="https://doi.org/10.1198/106186006X133933" target="_blank" rel="noopener">original paper</a> introducing conditional inference trees describes and motivates these differences well. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> spatialsample 0.2.0 https://www.tidyverse.org/blog/2022/06/spatialsample-0-2-0/ Tue, 21 Jun 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/06/spatialsample-0-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re positively electrified to announce the release of <a href="https://spatialsample.tidymodels.org/" target="_blank" rel="noopener">spatialsample</a> 0.2.0. spatialsample is a package for spatial resampling, extending the <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a> framework to help create spatial extrapolation between your analysis and assessment data sets.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"spatialsample"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will describe the highlights of what&rsquo;s new. You can see a full list of changes in the <a href="https://spatialsample.tidymodels.org/news/index.html#spatialsample-020" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="new-features">New Features <a href="#new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This version of spatialsample includes a new data set, made up of 682 hexagons containing data about tree canopy cover change in Boston, Massachusetts:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/spatialsample'>spatialsample</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggsf.html'>geom_sf</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-2-1.png" title="A map showing the spatial arrangement of hexagons making up the boston_canopy data set." alt="A map showing the spatial arrangement of hexagons making up the boston_canopy data set." width="700px" style="display: block; margin: auto;" /></p> </div> <p>This data is stored as an sf object, and as such contains information about the proper coordinate reference system and units of measurement associated with the data.</p> <p>This brings us to the first new feature in this release of spatialsample: <a href="https://spatialsample.tidymodels.org/reference/spatial_clustering_cv.html" target="_blank" rel="noopener"><code>spatial_clustering_cv()</code></a> now supports sf objects, and will calculate distances in a way that respects coordinate reference systems (including using the s2 geometry library for geographic coordinate reference systems):</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nv'>kmeans_clustering</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_clustering_cv.html'>spatial_clustering_cv</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span> <span class='nv'>kmeans_clustering</span> <span class='c'>#&gt; # 5-fold spatial cross-validation </span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span> <span class='c'>#&gt; splits id </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;split [524/158]&gt;</span> Fold1</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;split [493/189]&gt;</span> Fold2</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> <span style='color: #555555;'>&lt;split [517/165]&gt;</span> Fold3</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> <span style='color: #555555;'>&lt;split [605/77]&gt;</span> Fold4</span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> <span style='color: #555555;'>&lt;split [589/93]&gt;</span> Fold5</span></code></pre> </div> <p>This release also provides <a href="https://ggplot2.tidyverse.org/reference/autoplot.html" target="_blank" rel="noopener"><code>autoplot()</code></a> methods to visualize resamples via ggplot2, making it easy to see how exactly your data is being divided. Just call <a href="https://ggplot2.tidyverse.org/reference/autoplot.html" target="_blank" rel="noopener"><code>autoplot()</code></a> on the outputs from any spatial clustering function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>kmeans_clustering</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"kmeans()"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" title="A map showing the boston_canopy data set broken into five folds through spatial_clustering_cv. The five folds are visibly different sizes, and are grouped by spatial proximity." alt="A map showing the boston_canopy data set broken into five folds through spatial_clustering_cv. The five folds are visibly different sizes, and are grouped by spatial proximity." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In addition to supporting more types of data, <a href="https://spatialsample.tidymodels.org/reference/spatial_clustering_cv.html" target="_blank" rel="noopener"><code>spatial_clustering_cv()</code></a> has also been extended to support more types of clustering. Set the <code>cluster_function</code> argument to use <code>&quot;hclust&quot;</code> for hierarchical clustering via <a href="https://rdrr.io/r/stats/hclust.html" target="_blank" rel="noopener"><code>hclust()</code></a> instead of the default <a href="https://rdrr.io/r/stats/kmeans.html" target="_blank" rel="noopener"><code>kmeans()</code></a>-based clusters:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_clustering_cv.html'>spatial_clustering_cv</a></span><span class='o'>(</span> <span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span>, cluster_function <span class='o'>=</span> <span class='s'>"hclust"</span> <span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"hclust()"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" title="A map showing the boston_canopy data set broken into five folds through spatial_clustering_cv, using the hclust clustering method. The five folds are still visibly different sizes, and are grouped by spatial proximity, but the clusters are notably different from those produced by the default kmeans method." alt="A map showing the boston_canopy data set broken into five folds through spatial_clustering_cv, using the hclust clustering method. The five folds are still visibly different sizes, and are grouped by spatial proximity, but the clusters are notably different from those produced by the default kmeans method." width="700px" style="display: block; margin: auto;" /></p> </div> <p>This argument can also accept functions, letting you plug in clustering methodologies from other packages or that you&rsquo;ve written yourself:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nv'>custom_clusters</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>dists</span>, <span class='nv'>v</span>, <span class='nv'>...</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='nv'>letters</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='nv'>v</span><span class='o'>]</span>, length.out <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_clustering_cv.html'>spatial_clustering_cv</a></span><span class='o'>(</span> <span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span>, cluster_function <span class='o'>=</span> <span class='nv'>custom_clusters</span> <span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>title <span class='o'>=</span> <span class='s'>"custom_clusters()"</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" title="A map showing the outputs of spatial_clustering_cv when using a custom clustering function. The custom clustering function assigned folds systematically, moving sequentially through rows in the data frame, and as such the output does not look very clustered. However, the functions in spatialsample performed exactly the same with the custom clustering function as they did with the built-in options." alt="A map showing the outputs of spatial_clustering_cv when using a custom clustering function. The custom clustering function assigned folds systematically, moving sequentially through rows in the data frame, and as such the output does not look very clustered. However, the functions in spatialsample performed exactly the same with the custom clustering function as they did with the built-in options." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In addition to the clustering extensions, this version of spatialsample introduces new functions for other popular spatial resampling methods. For instance, <a href="https://spatialsample.tidymodels.org/reference/spatial_block_cv.html" target="_blank" rel="noopener"><code>spatial_block_cv()</code></a> helps you perform <a href="https://doi.org/10.1111/ecog.02881" target="_blank" rel="noopener">block cross-validation</a>, splitting your data into folds based on a grid of regular polygons. You can assign these polygons to folds at random:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_block_cv.html'>spatial_block_cv</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" title="A map showing the outputs of block cross-validation performed using spatial_block_cv. A regular grid of squares has been drawn over the boston_canopy data set, and all data falling into a single block is assigned to the same fold. Blocks are assigned to folds at random, resulting in a patchy distribution of folds across the data set." alt="A map showing the outputs of block cross-validation performed using spatial_block_cv. A regular grid of squares has been drawn over the boston_canopy data set, and all data falling into a single block is assigned to the same fold. Blocks are assigned to folds at random, resulting in a patchy distribution of folds across the data set." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Or systematically, either by assigning folds in order from the bottom-left and proceeding from left to right along each row by setting <code>method = &quot;continuous&quot;</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_block_cv.html'>spatial_block_cv</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span>, method <span class='o'>=</span> <span class='s'>"continuous"</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" title="A map showing the outputs of block cross-validation performed using spatial_block_cv with continuous systematic assignment. Rather than the patchy random assignment before, blocks are now assigned from left to right for each row of the regular grid, resulting in the same folds always being adjacent to one another." alt="A map showing the outputs of block cross-validation performed using spatial_block_cv with continuous systematic assignment. Rather than the patchy random assignment before, blocks are now assigned from left to right for each row of the regular grid, resulting in the same folds always being adjacent to one another." width="700px" style="display: block; margin: auto;" /></p> </div> <p>Or by &ldquo;snaking&rdquo; back and forth up the grid by setting <code>method = &quot;snake&quot;</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_block_cv.html'>spatial_block_cv</a></span><span class='o'>(</span><span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>5</span>, method <span class='o'>=</span> <span class='s'>"snake"</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" title="A map showing the outputs of block cross-validation performed using spatial_block_cv with snaking systematic assignment. Blocks are now assigned alternatively from left to right and right to left, resulting in a similar alignment of folds to the continuous method." alt="A map showing the outputs of block cross-validation performed using spatial_block_cv with snaking systematic assignment. Blocks are now assigned alternatively from left to right and right to left, resulting in a similar alignment of folds to the continuous method." width="700px" style="display: block; margin: auto;" /></p> </div> <p>This release of spatialsample also adds support for <a href="https://doi.org/10.1111/geb.12161" target="_blank" rel="noopener">leave-location-out cross-validation</a> through the new function <a href="https://spatialsample.tidymodels.org/reference/spatial_vfold.html" target="_blank" rel="noopener"><code>spatial_leave_location_out_cv()</code></a>. You can use this to create resamples when you already have a good idea of what data might be spatially correlated together &ndash; for instance, we can use it to split the Ames housing data from modeldata by neighborhood:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>ames</span>, package <span class='o'>=</span> <span class='s'>"modeldata"</span><span class='o'>)</span> <span class='nv'>ames_sf</span> <span class='o'>&lt;-</span> <span class='nf'>sf</span><span class='nf'>::</span><span class='nf'><a href='https://r-spatial.github.io/sf/reference/st_as_sf.html'>st_as_sf</a></span><span class='o'>(</span><span class='nv'>ames</span>, coords <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Longitude"</span>, <span class='s'>"Latitude"</span><span class='o'>)</span>, crs <span class='o'>=</span> <span class='m'>4326</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_vfold.html'>spatial_leave_location_out_cv</a></span><span class='o'>(</span><span class='nv'>ames_sf</span>, <span class='nv'>Neighborhood</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" title="A map showing the outputs of leave-location-out cross-validation performed using spatial_leave_location_out_cv on the Ames housing data. Folds are assigned based on what neighborhood each house falls into. Some neighborhoods are entirely contained within another neighborhood, and neighborhoods contain very different numbers of houses." alt="A map showing the outputs of leave-location-out cross-validation performed using spatial_leave_location_out_cv on the Ames housing data. Folds are assigned based on what neighborhood each house falls into. Some neighborhoods are entirely contained within another neighborhood, and neighborhoods contain very different numbers of houses." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="buffering">Buffering <a href="#buffering"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The last major feature in this release is the introduction of spatial buffering. Spatial buffering enforces a certain minimum distance between your analysis and assessment sets, making sure that you&rsquo;re spatially extrapolating when making predictions with a model.</p> <p>While all spatial resampling functions in spatialsample can use spatial buffers, particularly interesting is the new <a href="https://spatialsample.tidymodels.org/reference/spatial_vfold.html" target="_blank" rel="noopener"><code>spatial_buffer_vfold_cv()</code></a> function. This function makes it easy to add spatial buffers around a standard V-fold cross-validation procedure. When we plot the object returned by this function, it just looks like a standard V-fold cross-validation setup:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nv'>blocks</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_vfold.html'>spatial_buffer_vfold_cv</a></span><span class='o'>(</span> <span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>15</span>, buffer <span class='o'>=</span> <span class='m'>100</span>, radius <span class='o'>=</span> <span class='kc'>NULL</span> <span class='o'>)</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>blocks</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" title="A map showing the outputs of spatially buffered cross-validation performed using spatial_buffer_vfold_cv, once again using the boston_canopy data set. When visualizing all folds at once, there does not seem to be any spatial structure to the resamples; folds are distributed randomly throughout the data set, and folds abut one another without any spatial separation." alt="A map showing the outputs of spatially buffered cross-validation performed using spatial_buffer_vfold_cv, once again using the boston_canopy data set. When visualizing all folds at once, there does not seem to be any spatial structure to the resamples; folds are distributed randomly throughout the data set, and folds abut one another without any spatial separation." width="700px" style="display: block; margin: auto;" /></p> </div> <p>However, if we use <a href="https://ggplot2.tidyverse.org/reference/autoplot.html" target="_blank" rel="noopener"><code>autoplot()</code></a> to visualize the splits themselves, we can see that we&rsquo;ve created an exclusion buffer around each of our assessment sets. Data inside this buffer is assigned to neither the assessment or analysis set, so you can be sure your data is spatially separated:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>blocks</span><span class='o'>$</span><span class='nv'>splits</span> |&gt; <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>walk</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-12-.gif" title="An animation showing maps of each individual fold produced using spatial_buffer_vfold_cv. Now it is evident that any data adjacent to the assessment data has been added to a 'buffer' zone, and is part of neither the analysis or the assessment set." alt="An animation showing maps of each individual fold produced using spatial_buffer_vfold_cv. Now it is evident that any data adjacent to the assessment data has been added to a 'buffer' zone, and is part of neither the analysis or the assessment set." width="700px" style="display: block; margin: auto;" /></p> </div> <p>In addition to exclusion buffers, spatialsample now lets you add inclusion radii to any spatial resampling. This will add any points within a certain distance of the original assessment set to the assessment set, letting you create clumped &ldquo;discs&rdquo; of data to assess your models against:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>123</span><span class='o'>)</span> <span class='nv'>blocks</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://spatialsample.tidymodels.org/reference/spatial_vfold.html'>spatial_buffer_vfold_cv</a></span><span class='o'>(</span> <span class='nv'>boston_canopy</span>, v <span class='o'>=</span> <span class='m'>20</span>, buffer <span class='o'>=</span> <span class='m'>100</span>, radius <span class='o'>=</span> <span class='m'>100</span> <span class='o'>)</span> <span class='nv'>blocks</span><span class='o'>$</span><span class='nv'>splits</span> |&gt; <span class='nf'>purrr</span><span class='nf'>::</span><span class='nf'><a href='https://purrr.tidyverse.org/reference/map.html'>walk</a></span><span class='o'>(</span><span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/print.html'>print</a></span><span class='o'>(</span><span class='nf'><a href='https://ggplot2.tidyverse.org/reference/autoplot.html'>autoplot</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-13-.gif" title="Another animation showing maps of each individual fold produced using spatial_buffer_vfold_cv. When using the argument radius, points adjacent to the assessment set are themselves added to the assessment set. The buffer is then applied to each data point in the enlarged assessment set." alt="Another animation showing maps of each individual fold produced using spatial_buffer_vfold_cv. When using the argument radius, points adjacent to the assessment set are themselves added to the assessment set. The buffer is then applied to each data point in the enlarged assessment set." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="and-more">&hellip;and more! <a href="#and-more"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This is just scratching the surface of the new features and improvements in this release of spatialsample. You can see a full list of changes in the the <a href="https://spatialsample.tidymodels.org/news/index.html#spatialsample-020" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgments">Acknowledgments <a href="#acknowledgments"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank everyone that has contributed since the last release: <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mikemahoney218" target="_blank" rel="noopener">@mikemahoney218</a>, <a href="https://github.com/MxNl" target="_blank" rel="noopener">@MxNl</a>, <a href="https://github.com/nipnipj" target="_blank" rel="noopener">@nipnipj</a>, and <a href="https://github.com/PathosEthosLogos" target="_blank" rel="noopener">@PathosEthosLogos</a>.</p> Announcing vetiver for MLOps in R and Python https://www.tidyverse.org/blog/2022/06/announce-vetiver/ Thu, 09 Jun 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/06/announce-vetiver/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>We are thrilled to announce the release of <a href="https://vetiver.rstudio.com/" target="_blank" rel="noopener">vetiver</a>, a framework for MLOps tasks in R and Python! The goal of vetiver is to provide fluent tooling to <strong>version</strong>, <strong>share</strong>, <strong>deploy</strong>, and <strong>monitor</strong> a trained model. If you like perfume or candles, you may recognize this name; vetiver, also known as the &ldquo;oil of tranquility&rdquo;, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.</p> <p>You can install the released version of vetiver for R from <a href="https://cran.r-project.org/package=vetiver" target="_blank" rel="noopener">CRAN</a>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;vetiver&#34;</span><span class="p">)</span> </code></pre></div><p>You can install the released version of vetiver for Python from <a href="https://pypi.org/project/vetiver/" target="_blank" rel="noopener">PyPI</a>:</p> <div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="n">pip</span> <span class="n">install</span> <span class="n">vetiver</span> </code></pre></div><p>We are sharing more about what vetiver is and how it works over <a href="https://www.rstudio.com/blog/announce-vetiver/" target="_blank" rel="noopener">on the RStudio blog</a> so check that out, but we want to share here as well!</p> <h2 id="train-a-model">Train a model <a href="#train-a-model"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>For this example, let’s work with data on everyone&rsquo;s favorite dataset on fuel efficiency for cars to predict miles per gallon. In R, we can train a decision tree model to predict miles per gallon using a <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> workflow:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="n">car_mod</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.,</span> <span class="nf">decision_tree</span><span class="p">(</span><span class="n">mode</span> <span class="o">=</span> <span class="s">&#34;regression&#34;</span><span class="p">))</span> <span class="o">%&gt;%</span> <span class="nf">fit</span><span class="p">(</span><span class="n">mtcars</span><span class="p">)</span> </code></pre></div><p>In Python, we can train the same kind of model using <a href="https://scikit-learn.org/" target="_blank" rel="noopener">scikit-learn</a>:</p> <div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">vetiver.data</span> <span class="kn">import</span> <span class="n">mtcars</span> <span class="kn">from</span> <span class="nn">sklearn</span> <span class="kn">import</span> <span class="n">tree</span> <span class="n">car_mod</span> <span class="o">=</span> <span class="n">tree</span><span class="o">.</span><span class="n">DecisionTreeRegressor</span><span class="p">()</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">mtcars</span><span class="p">,</span> <span class="n">mtcars</span><span class="p">[</span><span class="s2">&#34;mpg&#34;</span><span class="p">])</span> </code></pre></div><p>For both R and Python, the <code>car_mod</code> object is a fitted model, with parameters estimated using our training data <code>mtcars</code>.</p> <h2 id="create-a-vetiver-model">Create a vetiver model <a href="#create-a-vetiver-model"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We can create a <code>vetiver_model()</code> in R or <code>VetiverModel()</code> in Python from the trained model; a vetiver model object collects the information needed to store, version, and deploy a trained model.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">vetiver</span><span class="p">)</span> <span class="n">v</span> <span class="o">&lt;-</span> <span class="nf">vetiver_model</span><span class="p">(</span><span class="n">car_mod</span><span class="p">,</span> <span class="s">&#34;cars_mpg&#34;</span><span class="p">)</span> <span class="n">v</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; ── cars_mpg ─ &lt;butchered_workflow&gt; model for deployment </span> <span class="c1">#&gt; A rpart regression modeling workflow using 10 features</span> </code></pre></div><div class="highlight"><pre class="chroma"><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">vetiver</span> <span class="kn">import</span> <span class="n">VetiverModel</span> <span class="n">v</span> <span class="o">=</span> <span class="n">VetiverModel</span><span class="p">(</span><span class="n">car_mod</span><span class="p">,</span> <span class="n">model_name</span> <span class="o">=</span> <span class="s2">&#34;cars_mpg&#34;</span><span class="p">,</span> <span class="n">save_ptype</span> <span class="o">=</span> <span class="bp">True</span><span class="p">,</span> <span class="n">ptype_data</span> <span class="o">=</span> <span class="n">mtcars</span><span class="p">)</span> <span class="n">v</span><span class="o">.</span><span class="n">description</span> <span class="c1">#&gt; &#34;Scikit-learn &lt;class &#39;sklearn.tree._classes.DecisionTreeRegressor&#39;&gt; model&#34;</span> </code></pre></div><p>See our documentation for how to use these deployable model objects and:</p> <ul> <li> <a href="https://vetiver.rstudio.com/get-started/version.html" target="_blank" rel="noopener">publish and version your model</a></li> <li> <a href="https://vetiver.rstudio.com/get-started/deploy.html" target="_blank" rel="noopener">deploy your model as a REST API</a></li> </ul> <p>Be sure to also read more <a href="https://www.rstudio.com/blog/announce-vetiver/" target="_blank" rel="noopener">on the RStudio blog</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to extend our thanks to all of the contributors who helped make these initial releases of vetiver for R and Python possible!</p> <ul> <li> <p>R package: <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/ggpinto" target="_blank" rel="noopener">@ggpinto</a>, <a href="https://github.com/isabelizimm" target="_blank" rel="noopener">@isabelizimm</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/mfansler" target="_blank" rel="noopener">@mfansler</a></p> </li> <li> <p>Python package: <a href="https://github.com/has2k1" target="_blank" rel="noopener">@has2k1</a>, and <a href="https://github.com/isabelizimm" target="_blank" rel="noopener">@isabelizimm</a></p> </li> </ul> dbplyr 2.2.0 https://www.tidyverse.org/blog/2022/06/dbplyr-2-2-0/ Mon, 06 Jun 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/06/dbplyr-2-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="https://dbplyr.tidyverse.org" target="_blank" rel="noopener">dbplyr</a> 2.2.0. dbplyr is a database backend for dplyr that allows you to use a remote database as if it was a collection of local data frames: you write ordinary dplyr code and dbplyr translates it to SQL for you.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dbplyr"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will discuss some of the biggest improvements to SQL translations, introduce <a href="https://dbplyr.tidyverse.org/reference/copy_inline.html" target="_blank" rel="noopener"><code>copy_inline()</code></a>, and discuss support for dplyr&rsquo;s <code>row_</code> functions. You can see a full list of changes in the <a href="https://github.com/tidyverse/dbplyr/releases/tag/v2.2.0" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dbplyr.tidyverse.org/'>dbplyr</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></code></pre> </div> <h2 id="sql-improvements">SQL improvements <a href="#sql-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This release brings with it a host of useful improvements to SQL generation. Firstly, dbplyr uses <code>*</code> where possible. This is particularly nice when you have a table with many names:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span><span class='o'>!</span><span class='o'>!</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/stats/setNames.html'>setNames</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>as.list</a></span><span class='o'>(</span><span class='m'>1</span><span class='o'>:</span><span class='m'>26</span><span class='o'>)</span>, <span class='nv'>letters</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>lf</span> <span class='c'>#&gt; &lt;SQL&gt;</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `df`</span></code></pre> </div> <p>If you&rsquo;re familiar with dbplyr&rsquo;s old SQL output, you&rsquo;ll also notice that the output receives some basic syntax highlighting and much improved line breaks and indenting.</p> <p>The use of <code>*</code> is particularly nice when you have a subquery. Previously the generated SQL would have repeated the column names <code>a</code> to <code>z</code> twice, once for each subquery.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lf</span> |&gt; <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>x2 <span class='o'>=</span> <span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span>, x3 <span class='o'>=</span> <span class='nv'>x2</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt;</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *, `x2` + 1.0<span style='color: #0000BB;'> AS </span>`x3`</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> (</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *, `x` + 1.0<span style='color: #0000BB;'> AS </span>`x2`</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `df`</span> <span class='c'>#&gt; ) `q01`</span></code></pre> </div> <p> <a href="https://dplyr.tidyverse.org/reference/explain.html" target="_blank" rel="noopener"><code>show_query()</code></a>, <a href="https://dplyr.tidyverse.org/reference/compute.html" target="_blank" rel="noopener"><code>compute()</code></a> and <a href="https://dplyr.tidyverse.org/reference/compute.html" target="_blank" rel="noopener"><code>collect()</code></a> have experimental support for common table expressions (CTEs), available by setting <code>cte = TRUE</code> argument. CTEs are the database equivalent of the pipe; they allow you to write subqueries in the order in which they&rsquo;re evaluated, rather than the opposite.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lf</span> |&gt; <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>x2 <span class='o'>=</span> <span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span>, x3 <span class='o'>=</span> <span class='nv'>x2</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>)</span> |&gt; <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span>cte <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt;</span> <span class='c'>#&gt; <span style='color: #0000BB;'>WITH </span>`q01`<span style='color: #0000BB;'> AS</span> (</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *, `x` + 1.0<span style='color: #0000BB;'> AS </span>`x2`</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `df`</span> <span class='c'>#&gt; )</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> *, `x2` + 1.0<span style='color: #0000BB;'> AS </span>`x3`</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> `q01`</span></code></pre> </div> <p>We&rsquo;ve also added support for translating <a href="https://rdrr.io/r/base/cut.html" target="_blank" rel="noopener"><code>cut()</code></a>: this is a very useful base R function that&rsquo;s fiddly to express in SQL:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>lf</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/tbl_lazy.html'>lazy_frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/translate_sql.html'>translate_sql</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/cut.html'>cut</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>25</span>, <span class='m'>50</span>, <span class='m'>100</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt; CASE</span> <span class='c'>#&gt; WHEN (`x` &lt;= 0.0) THEN NULL</span> <span class='c'>#&gt; WHEN (`x` &lt;= 25.0) THEN '(0,25]'</span> <span class='c'>#&gt; WHEN (`x` &lt;= 50.0) THEN '(25,50]'</span> <span class='c'>#&gt; WHEN (`x` &lt;= 100.0) THEN '(50,100]'</span> <span class='c'>#&gt; WHEN (`x` &gt; 100.0) THEN NULL</span> <span class='c'>#&gt; END</span> <span class='c'># Can provide custom labels</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/translate_sql.html'>translate_sql</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/cut.html'>cut</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>25</span>, <span class='m'>50</span>, <span class='m'>100</span><span class='o'>)</span>, labels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"small"</span>, <span class='s'>"medium"</span>, <span class='s'>"large"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt; CASE</span> <span class='c'>#&gt; WHEN (`x` &lt;= 0.0) THEN NULL</span> <span class='c'>#&gt; WHEN (`x` &lt;= 25.0) THEN 'small'</span> <span class='c'>#&gt; WHEN (`x` &lt;= 50.0) THEN 'medium'</span> <span class='c'>#&gt; WHEN (`x` &lt;= 100.0) THEN 'large'</span> <span class='c'>#&gt; WHEN (`x` &gt; 100.0) THEN NULL</span> <span class='c'>#&gt; END</span> <span class='c'># And use Inf/-Inf bounds</span> <span class='nf'><a href='https://dbplyr.tidyverse.org/reference/translate_sql.html'>translate_sql</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/cut.html'>cut</a></span><span class='o'>(</span> <span class='nv'>x</span>, breaks <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='o'>-</span><span class='kc'>Inf</span>, <span class='m'>25</span>, <span class='m'>50</span>, <span class='kc'>Inf</span><span class='o'>)</span>, labels <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"small"</span>, <span class='s'>"medium"</span>, <span class='s'>"large"</span><span class='o'>)</span> <span class='o'>)</span> <span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt; CASE</span> <span class='c'>#&gt; WHEN (`x` &lt;= 25.0) THEN 'small'</span> <span class='c'>#&gt; WHEN (`x` &lt;= 50.0) THEN 'medium'</span> <span class='c'>#&gt; WHEN (`x` &gt; 50.0) THEN 'large'</span> <span class='c'>#&gt; END</span></code></pre> </div> <p>There are also a whole host of minor translation improvements which you can read about in the <a href="https://github.com/tidyverse/dbplyr/releases/tag/v2.2.0" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="copy_inline"><code>copy_inline()</code> <a href="#copy_inline"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://dbplyr.tidyverse.org/reference/copy_inline.html" target="_blank" rel="noopener"><code>copy_inline()</code></a> provides a new way to get data out of R and into the database by embedding the data directly in the query. This is a natural complement to <a href="https://dplyr.tidyverse.org/reference/copy_to.html" target="_blank" rel="noopener"><code>copy_to()</code></a> which writes data to a temporary table. <a href="https://dbplyr.tidyverse.org/reference/copy_inline.html" target="_blank" rel="noopener"><code>copy_inline()</code></a> is faster for small datasets and is particularly useful when you don&rsquo;t have the permissions needed to create temporary tables. Here&rsquo;s a very simple example of what the generated SQL will look like for PostgreSQL</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>5</span>, y <span class='o'>=</span> <span class='nv'>letters</span><span class='o'>[</span><span class='m'>1</span><span class='o'>:</span><span class='m'>5</span><span class='o'>]</span><span class='o'>)</span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/copy_inline.html'>copy_inline</a></span><span class='o'>(</span><span class='nf'><a href='https://dbplyr.tidyverse.org/reference/backend-postgres.html'>simulate_postgres</a></span><span class='o'>(</span><span class='o'>)</span>, <span class='nv'>df</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; &lt;SQL&gt;</span> <span class='c'>#&gt; <span style='color: #0000BB;'>SELECT</span> CAST(`x` AS INTEGER)<span style='color: #0000BB;'> AS </span>`x`, CAST(`y` AS TEXT)<span style='color: #0000BB;'> AS </span>`y`</span> <span class='c'>#&gt; <span style='color: #0000BB;'>FROM</span> ( <span style='color: #0000BB;'>VALUES</span> (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')) AS drvd(`x`, `y`)</span></code></pre> </div> <h2 id="row-modification">Row modification <a href="#row-modification"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>dplyr 1.0.0 added a family of <a href="https://www.tidyverse.org/blog/2020/05/dplyr-1-0-0-last-minute-additions/#row-mutation" target="_blank" rel="noopener">row modification</a> functions: <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_insert()</code></a>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_append()</code></a>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_update()</code></a>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_patch()</code></a>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_upsert()</code></a>, and <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_delete()</code></a>. These functions were inspired by SQL and are now supported by dbplyr.</p> <p>The primary purpose of these functions is to modify the underlying tables. Because that purpose is dangerous, you&rsquo;ll need to deliberate opt-in to modification by setting <code>in_place = TRUE</code>. Use the default behaviour, <code>in_place = FALSE</code>, to simulate what the result will be.</p> <p>With <code>in_place = FALSE</code>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_insert()</code></a> and <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_append()</code></a> performs an <code>INSERT</code>, <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_update()</code></a> and <code>rows_path()</code> perform an <code>UPDATE</code>, and <a href="https://dplyr.tidyverse.org/reference/rows.html" target="_blank" rel="noopener"><code>rows_delete()</code></a> performs a <code>DELETE.</code></p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Most of the work in this release was done by dbplyr author <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>: thanks for all your continued hard work!</p> <p>And a big thanks to all 77 other contributors who&rsquo;s comments, code, and discussion helped make a better package: <a href="https://github.com/001ben" target="_blank" rel="noopener">@001ben</a>, <a href="https://github.com/1beb" target="_blank" rel="noopener">@1beb</a>, <a href="https://github.com/Ada-Nick" target="_blank" rel="noopener">@Ada-Nick</a>, <a href="https://github.com/admivsn" target="_blank" rel="noopener">@admivsn</a>, <a href="https://github.com/alex-m-ffm" target="_blank" rel="noopener">@alex-m-ffm</a>, <a href="https://github.com/andreassoteriadesmoj" target="_blank" rel="noopener">@andreassoteriadesmoj</a>, <a href="https://github.com/andyquinterom" target="_blank" rel="noopener">@andyquinterom</a>, <a href="https://github.com/apalacio10" target="_blank" rel="noopener">@apalacio10</a>, <a href="https://github.com/apalacio9502" target="_blank" rel="noopener">@apalacio9502</a>, <a href="https://github.com/aris-hastings" target="_blank" rel="noopener">@aris-hastings</a>, <a href="https://github.com/asimumba" target="_blank" rel="noopener">@asimumba</a>, <a href="https://github.com/ben1787" target="_blank" rel="noopener">@ben1787</a>, <a href="https://github.com/boshek" target="_blank" rel="noopener">@boshek</a>, <a href="https://github.com/caljnj" target="_blank" rel="noopener">@caljnj</a>, <a href="https://github.com/carlganz" target="_blank" rel="noopener">@carlganz</a>, <a href="https://github.com/CLRafaelR" target="_blank" rel="noopener">@CLRafaelR</a>, <a href="https://github.com/coponhub" target="_blank" rel="noopener">@coponhub</a>, <a href="https://github.com/cslewis04" target="_blank" rel="noopener">@cslewis04</a>, <a href="https://github.com/dbaston" target="_blank" rel="noopener">@dbaston</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/DrFabach" target="_blank" rel="noopener">@DrFabach</a>, <a href="https://github.com/EarlGlynn" target="_blank" rel="noopener">@EarlGlynn</a>, <a href="https://github.com/edonnachie" target="_blank" rel="noopener">@edonnachie</a>, <a href="https://github.com/eipi10" target="_blank" rel="noopener">@eipi10</a>, <a href="https://github.com/eitsupi" target="_blank" rel="noopener">@eitsupi</a>, <a href="https://github.com/fh-afrachioni" target="_blank" rel="noopener">@fh-afrachioni</a>, <a href="https://github.com/fh-kpikhart" target="_blank" rel="noopener">@fh-kpikhart</a>, <a href="https://github.com/ggpinto" target="_blank" rel="noopener">@ggpinto</a>, <a href="https://github.com/GuillaumePressiat" target="_blank" rel="noopener">@GuillaumePressiat</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/HarlanH" target="_blank" rel="noopener">@HarlanH</a>, <a href="https://github.com/hdplsa" target="_blank" rel="noopener">@hdplsa</a>, <a href="https://github.com/iangow" target="_blank" rel="noopener">@iangow</a>, <a href="https://github.com/James-G-Hill" target="_blank" rel="noopener">@James-G-Hill</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jiaqizhu-learning" target="_blank" rel="noopener">@jiaqizhu-learning</a>, <a href="https://github.com/jonkeane" target="_blank" rel="noopener">@jonkeane</a>, <a href="https://github.com/jsspurgeon" target="_blank" rel="noopener">@jsspurgeon</a>, <a href="https://github.com/julieinsan" target="_blank" rel="noopener">@julieinsan</a>, <a href="https://github.com/k6adams" target="_blank" rel="noopener">@k6adams</a>, <a href="https://github.com/kelnerrr" target="_blank" rel="noopener">@kelnerrr</a>, <a href="https://github.com/kmishra9" target="_blank" rel="noopener">@kmishra9</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/Leprechault" target="_blank" rel="noopener">@Leprechault</a>, <a href="https://github.com/Liudvikas-vinted" target="_blank" rel="noopener">@Liudvikas-vinted</a>, <a href="https://github.com/LukasWallrich" target="_blank" rel="noopener">@LukasWallrich</a>, <a href="https://github.com/m-sostero" target="_blank" rel="noopener">@m-sostero</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/mattcane" target="_blank" rel="noopener">@mattcane</a>, <a href="https://github.com/mfherman" target="_blank" rel="noopener">@mfherman</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/Mosk915" target="_blank" rel="noopener">@Mosk915</a>, <a href="https://github.com/nassuphis" target="_blank" rel="noopener">@nassuphis</a>, <a href="https://github.com/nirski" target="_blank" rel="noopener">@nirski</a>, <a href="https://github.com/nviets" target="_blank" rel="noopener">@nviets</a>, <a href="https://github.com/overmar" target="_blank" rel="noopener">@overmar</a>, <a href="https://github.com/p-schaefer" target="_blank" rel="noopener">@p-schaefer</a>, <a href="https://github.com/plogacev" target="_blank" rel="noopener">@plogacev</a>, <a href="https://github.com/randy3k" target="_blank" rel="noopener">@randy3k</a>, <a href="https://github.com/recleev" target="_blank" rel="noopener">@recleev</a>, <a href="https://github.com/rmcd1024" target="_blank" rel="noopener">@rmcd1024</a>, <a href="https://github.com/rsund" target="_blank" rel="noopener">@rsund</a>, <a href="https://github.com/rvomm" target="_blank" rel="noopener">@rvomm</a>, <a href="https://github.com/samssann" target="_blank" rel="noopener">@samssann</a>, <a href="https://github.com/sfirke" target="_blank" rel="noopener">@sfirke</a>, <a href="https://github.com/Sir-Chibi" target="_blank" rel="noopener">@Sir-Chibi</a>, <a href="https://github.com/sitendug" target="_blank" rel="noopener">@sitendug</a>, <a href="https://github.com/somatusag" target="_blank" rel="noopener">@somatusag</a>, <a href="https://github.com/stephenashton-dhsc" target="_blank" rel="noopener">@stephenashton-dhsc</a>, <a href="https://github.com/swnydick" target="_blank" rel="noopener">@swnydick</a>, <a href="https://github.com/thothal" target="_blank" rel="noopener">@thothal</a>, <a href="https://github.com/torbjorn" target="_blank" rel="noopener">@torbjorn</a>, <a href="https://github.com/tsengj" target="_blank" rel="noopener">@tsengj</a>, <a href="https://github.com/vspinu" target="_blank" rel="noopener">@vspinu</a>, <a href="https://github.com/Waftmaster" target="_blank" rel="noopener">@Waftmaster</a>, <a href="https://github.com/williamlai2" target="_blank" rel="noopener">@williamlai2</a>, and <a href="https://github.com/yitao-li" target="_blank" rel="noopener">@yitao-li</a>.</p> GitHub Actions for R developers, v2 https://www.tidyverse.org/blog/2022/06/actions-2-0-0/ Wed, 01 Jun 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/06/actions-2-0-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce a <code>v2</code> release of our collection of R related GitHub Actions at <a href="https://github.com/r-lib/actions">https://github.com/r-lib/actions</a>.</p> <p>If you are already using these actions, you might want to take look at the <a href="https://github.com/r-lib/actions/releases/tag/v2" target="_blank" rel="noopener">full list of changes</a> first.</p> <p>In this post, we&rsquo;ll show how to set up <code>r-lib/actions</code> for your R package or project, and what is new in the <code>v2</code> version.</p> <h2 id="about-rlibactions">About <code>rlib/actions</code> <a href="#about-rlibactions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://github.com/features/actions" target="_blank" rel="noopener">GitHub Actions</a> is a continuous integration service that allows you to automatically run code whenever you push to GitHub. If you&rsquo;re developing a package this allows you to automate tasks like running <code>R CMD check</code> on multiple platforms or rebuilding your <a href="https://pkgdown.r-lib.org/" target="_blank" rel="noopener">pkgdown</a> website.</p> <p>The <a href="https://github.com/r-lib/actions#readme" target="_blank" rel="noopener"><code>r-lib/actions</code></a> repo has a number of reusable actions that perform common R-related tasks: installing R and Rtools, pandoc, installing dependencies of R packages, running <code>R CMD check</code>, etc.:</p> <ul> <li> <p> <a href="https://github.com/r-lib/actions/tree/v2/setup-r#readme" target="_blank" rel="noopener"><code>setup-r</code></a> installs R and on Windows Rtools,</p> </li> <li> <p> <a href="https://github.com/r-lib/actions/tree/v2/setup-pandoc#readme" target="_blank" rel="noopener"><code>setup-pandoc</code></a> installs pandoc,</p> </li> <li> <p> <a href="https://github.com/r-lib/actions/tree/v2/setup-r-dependencies#readme" target="_blank" rel="noopener"><code>setup-r-dependencies</code></a> installs R package dependencies,</p> </li> <li> <p> <a href="https://github.com/r-lib/actions/tree/v2/check-r-package#readme" target="_blank" rel="noopener"><code>check-r-package</code></a> runs <code>R CMD check</code> on an R package.</p> </li> </ul> <p>See the <a href="https://github.com/r-lib/actions#readme" target="_blank" rel="noopener">README</a> for the complete list of actions.</p> <h2 id="setting-up-r-libactions">Setting up <code>r-lib/actions</code> <a href="#setting-up-r-libactions"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The <code>r-lib/actions</code> repo has <a href="https://github.com/r-lib/actions/tree/v2-branch/examples#example-workflows" target="_blank" rel="noopener">example workflows</a>, it is best to start with these.</p> <p>You can copy the ones you&rsquo;d like to use to the <code>.github/workflows</code> directory of your R package or project. For an R package you would typically want the <code>test-coverage</code> workflow and one of the <code>check-</code> workflows, depending on how thoroughly you want to check your package across operating systems and R versions. If your package has a pkgdown site then you probably also want the <code>pkgdown</code> workflow.</p> <p>The usethis package has several helper functions to set up GitHub Actions for you: <a href="https://usethis.r-lib.org/reference/github_actions.html" target="_blank" rel="noopener"><code>?usethis::use_github_action</code></a>. You&rsquo;ll need the latest version of usethis, version 2.1.6 for this.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">usethis</span><span class="o">::</span><span class="nf">use_github_action</span><span class="p">(</span><span class="s">&#34;check-standard&#34;</span><span class="p">)</span> <span class="n">usethis</span><span class="o">::</span><span class="nf">use_github_action</span><span class="p">(</span><span class="s">&#34;test-coverage&#34;</span><span class="p">)</span> <span class="n">usethis</span><span class="o">::</span><span class="nf">use_github_action</span><span class="p">(</span><span class="s">&#34;pkgdown&#34;</span><span class="p">)</span> </code></pre></div> <h2 id="which-tag-or-branch-should-i-use">Which tag or branch should I use? <a href="#which-tag-or-branch-should-i-use"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>In short, use the <code>v2</code> tag.</p> <p>The <code>v2</code> tag is a <em>sliding</em> tag. It is not fixed to a certain version, but we regularly update it with (non-breaking) improvements and fixes. If it is absolutely crucial that your workflow runs the same way, use one of the fixed tags, e.g. <code>v2.2.2</code> is the most recent one.</p> <p>As of today, usethis v2.1.6 defaults to configuring workflows from the <code>v2</code> tag. But <code>use_github_action()</code> accepts a <code>ref</code> argument, which allows you specify a different tag (such as <code>v2.2.2</code>) or even a branch name or specific SHA.</p> <h2 id="what-is-new">What is new? <a href="#what-is-new"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="make-a-plan-and-stick-to-it">Make a plan and stick to it <a href="#make-a-plan-and-stick-to-it"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p><code>setup-r-dependencies@v2</code> takes a more principled approach to resolving and installing system and package dependencies:</p> <ol> <li>It looks up all system (on supported Linux distributions) and package dependencies, and works out an installation plan with a set of package versions that are compatible with each other. (If it cannot find such set, then the action already fails here.)</li> <li>It writes the plan into a <em>lock file</em>. This is a machine readable (JSON) file, that it also printed to the job&rsquo;s log file. This is the blueprint of the installation.</li> <li>It potentially restores a cached set of installed packages. These are often the same exact package versions that are included in the installation plan. However, for efficiency, <code>setup-r-dependencies</code> also restores cache versions that are slightly different.</li> <li>On Linux (if the distribution is supported) it installs all system requirements, according to the lock file.</li> <li>It goes over the install plan again, to check that the packages (potentially) restored from the cache are the same as the ones in the plan. If a package is different, then it upgrades (or downgrades) it according to the plan.</li> <li>At the end of the job, is saves the installed packages into the cache.</li> </ol> <p>At the end of the installation you can be sure that exactly the planned packages are installed.</p> <p>See the <code>setup-r-dependencies</code> <a href="https://github.com/r-lib/actions/tree/v2-branch/setup-r-dependencies#readme" target="_blank" rel="noopener">README</a> for more explanation and examples.</p> <h3 id="simpler-workflow-files">Simpler workflow files <a href="#simpler-workflow-files"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If you update your existing workflows to use the <code>v2</code> actions, also take a look at the new <a href="https://github.com/r-lib/actions/tree/v2/examples" target="_blank" rel="noopener">example workflows</a>. These are typically much simpler than the previously suggested workflows, because we moved some workflow steps into the new actions. E.g. <code>check-r-package</code> always prints testthat output and it uploads the check directory as an artifact on failure, you don&rsquo;t need to do these explicitly in the workflow. <code>setup-r-dependencies</code> now prints the session info with all installed packages, no need to do this explicitly.</p> <p>To be clear, &ldquo;updating your GHA workflows to <code>v2</code>&rdquo; generally goes beyond just changing every instance of <code>v1</code> to <code>v2</code>. The example workflows have also evolved, i.e. you really need to update entire YAML workflow file.</p> <h3 id="snapshots-as-artifacts">Snapshots as artifacts <a href="#snapshots-as-artifacts"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Encoding issues are not uncommon in snapshot tests across platforms. To make these easier to debug, <code>check-r-package@v2</code> will now upload snapshot output as artifacts if you set the <code>upload-snapshots</code> parameter to <code>true</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-yaml" data-lang="yaml"><span class="w"> </span>- <span class="k">uses</span><span class="p">:</span><span class="w"> </span>r-lib/actions/check-r-package@v2<span class="w"> </span><span class="w"> </span><span class="k">with</span><span class="p">:</span><span class="w"> </span><span class="w"> </span><span class="k">upload-snapshots</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span></code></pre></div><p>See the <a href="https://testthat.r-lib.org/articles/snapshotting.html" target="_blank" rel="noopener">Snapshot tests</a> article in the testthat manual for more about testthat snapshots.</p> <h3 id="rtools42-support">Rtools42 support <a href="#rtools42-support"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://www.r-project.org/nosvn/winutf8/ucrt3/web/rtools.html" target="_blank" rel="noopener">Rtools42</a> is the new version of the Rtools compiler bundle, which will be the default for latest R 4.2.0. You can now optionally install Rtools42 with the <code>setup-r</code> action. By default <code>setup-r</code> uses <a href="https://cran.r-project.org/bin/windows/Rtools/rtools40.html" target="_blank" rel="noopener">Rtools40</a> because it is pre-installed on the CI machines, and it is fully compatible with Rtools42. To select Rtools42, set the <code>rtools-version</code> parameter to <code>42</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-yaml" data-lang="yaml"><span class="w"> </span>- <span class="k">uses</span><span class="p">:</span><span class="w"> </span>r-lib/actions/setup-r@v2-branch<span class="w"> </span><span class="w"> </span><span class="k">with</span><span class="p">:</span><span class="w"> </span><span class="w"> </span><span class="k">r-version</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;devel&#39;</span><span class="w"> </span><span class="w"> </span><span class="k">rtools-version</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;42&#39;</span><span class="w"> </span></code></pre></div><p>See <a href="https://github.com/r-lib/actions/blob/27ac87278d916382a04662af42392f3c921ee37e/.github/workflows/check-full.yaml" target="_blank" rel="noopener">this example</a> if you want to use <code>rtools-version</code> in a matrix build.</p> <h3 id="other-changes">Other changes <a href="#other-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>See the READMEs for more details.</p> <ul> <li> <p><code>setup-r-dependencies</code> now does not always install the latest versions of the dependencies.</p> </li> <li> <p>You can ask <code>setup-r-dependencies</code> to ignore some optional dependencies on older R versions.</p> </li> <li> <p>The Linux system requirements look-up is more robust now, and uses <code>SystemRequirements</code> fields from all local, GitHub or URL remotes, and it also uses the package installation plan, instead of only relying on the dependency tress of CRAN packages.</p> </li> <li> <p><code>setup-r-dependencies</code> and <code>check-r-package</code> now have a <code>working-directory</code> parameter.</p> </li> <li> <p><code>setup-r-dependencies</code> now works on all x86_64 Linux distributions (but only installs system requirements on supported ones, see the README).</p> </li> <li> <p>The example *down (blogdown, pkgdown and bookdown) workflows now build the web site in pull requests as well, but only deploy on push and release events. They also have a manual trigger.</p> </li> <li> <p>The example *down workflows now protect against race conditions.</p> </li> </ul> <h2 id="feedback">Feedback <a href="#feedback"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Your feedback is much appreciated. Before reporting a <a href="https://github.com/r-lib/actions/issues/new/choose" target="_blank" rel="noopener">new issue</a>, please check if it was already reported, see the <a href="https://github.com/r-lib/actions/issues" target="_blank" rel="noopener">list of issues</a>, especially the pinned issues (if any) at the top of the issue page.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to everyone who contributed to <code>r-lib/actions</code>: <a href="https://github.com/andrewl776" target="_blank" rel="noopener">@andrewl776</a>, <a href="https://github.com/arisp99" target="_blank" rel="noopener">@arisp99</a>, <a href="https://github.com/assignUser" target="_blank" rel="noopener">@assignUser</a>, <a href="https://github.com/astamm" target="_blank" rel="noopener">@astamm</a>, <a href="https://github.com/bribroder" target="_blank" rel="noopener">@bribroder</a>, <a href="https://github.com/duckmayr" target="_blank" rel="noopener">@duckmayr</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/harupy" target="_blank" rel="noopener">@harupy</a>, <a href="https://github.com/ijlyttle" target="_blank" rel="noopener">@ijlyttle</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jeroen" target="_blank" rel="noopener">@jeroen</a>, <a href="https://github.com/krlmlr" target="_blank" rel="noopener">@krlmlr</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/MikkoVihtakari" target="_blank" rel="noopener">@MikkoVihtakari</a>, <a href="https://github.com/ms609" target="_blank" rel="noopener">@ms609</a>, <a href="https://github.com/pat-s" target="_blank" rel="noopener">@pat-s</a>, <a href="https://github.com/s-u" target="_blank" rel="noopener">@s-u</a>, <a href="https://github.com/slwu89" target="_blank" rel="noopener">@slwu89</a>, <a href="https://github.com/vincentarelbundock" target="_blank" rel="noopener">@vincentarelbundock</a>, and <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>.</p> roxygen2 7.2.0 https://www.tidyverse.org/blog/2022/05/roxygen2-7-2-0/ Fri, 13 May 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/05/roxygen2-7-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the release of <a href="https://roxygen2.r-lib.org" target="_blank" rel="noopener">roxygen2</a> 7.2.0. roxygen2 allows you to write specially formatted R comments that generate R documentation files (<code>man/*.Rd</code>) and the <code>NAMESPACE</code> file. roxygen2 is used by over 9,000 CRAN packages.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"roxygen2"</span><span class='o'>)</span></code></pre> </div> <p>There are five big improvements in this release:</p> <ul> <li> <p>The <code>NAMESPACE</code> roclet now preserves all existing non-import directives during its first pass. This will generally eliminate the pair of <code>&quot;NAMESPACE has changed&quot;</code> messages and should reduce the chances that you end up with a sufficiently broken <code>NAMESPACE</code> that you can&rsquo;t re-load and re-document your package.</p> </li> <li> <p><code>@inheritParams</code> now only inherits exact multi-parameter matches. For example take <code>my_plot()</code> below:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'>#' @param width,height The dimensions in inches</span> <span class='nv'>my_plot</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>width</span>, <span class='nv'>height</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='o'>&#125;</span></code></pre> </div> <p>Previously, <code>width</code> and <code>height</code> were inherited individually, so this roxygen2 block:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'>#' @inheritParams my_plot</span> <span class='nv'>your_plot</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, <span class='nv'>width</span>, <span class='nv'>height</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='o'>&#125;</span> </code></pre> </div> <p>Would be equivalent to:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'>#' @param width The dimensions in inches</span> <span class='c'>#' @param height The dimensions in inches</span> <span class='nv'>your_plot</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span>, <span class='nv'>width</span>, <span class='nv'>height</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='o'>&#125;</span> </code></pre> </div> <p>Now, multi-parameter arguments will be inherited as a whole. This could potentially break your documentation if you (e.g.) only had one of <code>width</code> and <code>height</code> in your function. But we&rsquo;ve only seen this problem a few places in the tidyverse, it was easily fixed, and inherited arguments are generally much improved.</p> </li> <li> <p>We&rsquo;ve done a thorough review of all warning messages to make them more informative and actionable. We&rsquo;ve also fixed a number of bugs that led to invalid Rd files or pointed you to the wrong place.</p> <p>If you have a daily build of RStudio, warnings now include a clickable link that takes you directly to the problem. This technology is under active development across the IDE and the <a href="https://cli.r-lib.org" target="_blank" rel="noopener">cli</a> package and you can expect to see more of it in the future.</p> </li> </ul> <p>You can see a full list of changes in the <a href="https://github.com/r-lib/roxygen2/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to everyone who contributed to this release through their issues, pull requests, and discussions! <a href="https://github.com/AlexisDerumigny" target="_blank" rel="noopener">@AlexisDerumigny</a>, <a href="https://github.com/BenWiseman" target="_blank" rel="noopener">@BenWiseman</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/bobjansen" target="_blank" rel="noopener">@bobjansen</a>, <a href="https://github.com/brry" target="_blank" rel="noopener">@brry</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/cjyetman" target="_blank" rel="noopener">@cjyetman</a>, <a href="https://github.com/courtiol" target="_blank" rel="noopener">@courtiol</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/danielvartan" target="_blank" rel="noopener">@danielvartan</a>, <a href="https://github.com/DarioS" target="_blank" rel="noopener">@DarioS</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dieghernan" target="_blank" rel="noopener">@dieghernan</a>, <a href="https://github.com/dmurdoch" target="_blank" rel="noopener">@dmurdoch</a>, <a href="https://github.com/dwachsmuth" target="_blank" rel="noopener">@dwachsmuth</a>, <a href="https://github.com/flrd" target="_blank" rel="noopener">@flrd</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/JantekM" target="_blank" rel="noopener">@JantekM</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/karoliskoncevicius" target="_blank" rel="noopener">@karoliskoncevicius</a>, <a href="https://github.com/kongdd" target="_blank" rel="noopener">@kongdd</a>, <a href="https://github.com/kpagacz" target="_blank" rel="noopener">@kpagacz</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/lorenzwalthert" target="_blank" rel="noopener">@lorenzwalthert</a>, <a href="https://github.com/maelle" target="_blank" rel="noopener">@maelle</a>, <a href="https://github.com/malcolmbarrett" target="_blank" rel="noopener">@malcolmbarrett</a>, <a href="https://github.com/mbojan" target="_blank" rel="noopener">@mbojan</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/MislavSag" target="_blank" rel="noopener">@MislavSag</a>, <a href="https://github.com/mschilli87" target="_blank" rel="noopener">@mschilli87</a>, <a href="https://github.com/Nelson-Gon" target="_blank" rel="noopener">@Nelson-Gon</a>, <a href="https://github.com/netique" target="_blank" rel="noopener">@netique</a>, <a href="https://github.com/pnacht" target="_blank" rel="noopener">@pnacht</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/saicharanp18" target="_blank" rel="noopener">@saicharanp18</a>, <a href="https://github.com/simonsays1980" target="_blank" rel="noopener">@simonsays1980</a>, <a href="https://github.com/ThierryO" target="_blank" rel="noopener">@ThierryO</a>, <a href="https://github.com/wch" target="_blank" rel="noopener">@wch</a>, <a href="https://github.com/wurli" target="_blank" rel="noopener">@wurli</a>, and <a href="https://github.com/yogat3ch" target="_blank" rel="noopener">@yogat3ch</a>.</p> Using case weights with tidymodels https://www.tidyverse.org/blog/2022/05/case-weights/ Thu, 05 May 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/05/case-weights/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>We are pleased to announce that tidymodels packages now support the use of case weights. There has been a ton of work and multiple technical hurdles to overcome. The diversity of the types of weights and how they should be used is very complex, but I think that we&rsquo;ve come up with a solution that is fairly straightforward for users.</p> <p>Several packages are affected by these changes and we&rsquo;re keeping them on GitHub until everything is finalized. See the last section for instructions for installing the development versions.</p> <h2 id="what-are-case-weights">What are case weights? <a href="#what-are-case-weights"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Case weights are non-negative numbers used to specify how much each observation influences the estimation of a model.</p> <p>If you are new to this term, it is worth reading Thomas Lumley’s excellent post <a href="https://notstatschat.rbind.io/2020/08/04/weights-in-statistics/" target="_blank" rel="noopener"><em>Weights in statistics</em></a> as well as <a href="https://projecteuclid.org/journals/statistical-science/volume-22/issue-2/Struggles-with-Survey-Weighting-and-Regression-Modeling/10.1214/088342306000000691.full" target="_blank" rel="noopener">&ldquo;Struggles with Survey Weighting and Regression Modeling&rdquo;</a>. Although &ldquo;case weights&rdquo; isn&rsquo;t a universally used term, we&rsquo;ll use it to distinguish it from other types of weights, such as class weights in cost-sensitive learning and others.</p> <p>There are different types of case weights whose terminology can be very different across problem domains. Here are some examples:</p> <ul> <li><strong>Frequency weights</strong> are integers that denote how many times a particular row of data has been observed. They help compress redundant rows into a single entry.</li> <li><strong>Importance weights</strong> focus on how much each row of the data set should influence model estimation. These can be based on data or arbitrarily set to achieve some goal.</li> <li>When survey respondents have different probabilities of selection, (inverse) <strong>probability weights</strong> can help reduce bias in the results of a data analysis.</li> <li>If a data point has an associated precision, <strong>analytic weighting</strong> helps a model focus on the data points with less uncertainty (such as in meta-analysis).</li> </ul> <p>There are undoubtedly more types of weights in other domains. Quoting <a href="https://projecteuclid.org/journals/statistical-science/volume-22/issue-2/Struggles-with-Survey-Weighting-and-Regression-Modeling/10.1214/088342306000000691.full" target="_blank" rel="noopener">Andrew Gelman</a>:</p> <blockquote> <p>Weighting causes no end of confusion both in applied and theoretical statistics. People just assume because something has one name (&ldquo;weights&rdquo;), it is one thing. So then we get questions like, &ldquo;How do you do weighted regression in Stan,&rdquo; and we have to reply, &ldquo;What is it that you actually want to do?&rdquo;</p> </blockquote> <h2 id="how-are-they-used-in-traditional-modeling">How are they used in traditional modeling? <a href="#how-are-they-used-in-traditional-modeling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A traditional example is categorical data where a small number of possible categories are observed many times. For example, <code>UCBAdmissions</code> contains &ldquo;Aggregate data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and sex.&rdquo;</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">data</span><span class="p">(</span><span class="s">&#34;UCBAdmissions&#34;</span><span class="p">)</span> <span class="n">UCBAdmissions</span> </code></pre></div><pre><code>## , , Dept = A ## ## Gender ## Admit Male Female ## Admitted 512 89 ## Rejected 313 19 ## ## , , Dept = B ## ## Gender ## Admit Male Female ## Admitted 353 17 ## Rejected 207 8 ## ## , , Dept = C ## ## Gender ## Admit Male Female ## Admitted 120 202 ## Rejected 205 391 ## ## , , Dept = D ## ## Gender ## Admit Male Female ## Admitted 138 131 ## Rejected 279 244 ## ## , , Dept = E ## ## Gender ## Admit Male Female ## Admitted 53 94 ## Rejected 138 299 ## ## , , Dept = F ## ## Gender ## Admit Male Female ## Admitted 22 24 ## Rejected 351 317 </code></pre><p>This is a 3D array, so let&rsquo;s convert it to a rectangular data format:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="n">ucb</span> <span class="o">&lt;-</span> <span class="nf">as_tibble</span><span class="p">(</span><span class="n">UCBAdmissions</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span><span class="nf">across</span><span class="p">(</span><span class="nf">where</span><span class="p">(</span><span class="n">is.character</span><span class="p">),</span> <span class="o">~</span> <span class="nf">as.factor</span><span class="p">(</span><span class="n">.)</span><span class="p">))</span> <span class="n">ucb</span> </code></pre></div><pre><code>## # A tibble: 24 × 4 ## Admit Gender Dept n ## &lt;fct&gt; &lt;fct&gt; &lt;fct&gt; &lt;dbl&gt; ## 1 Admitted Male A 512 ## 2 Rejected Male A 313 ## 3 Admitted Female A 89 ## 4 Rejected Female A 19 ## 5 Admitted Male B 353 ## 6 Rejected Male B 207 ## 7 Admitted Female B 17 ## 8 Rejected Female B 8 ## 9 Admitted Male C 120 ## 10 Rejected Male C 205 ## # … with 14 more rows </code></pre><p>There are 24 possible configurations of the variables but a total of 4526 observations. If we want to model the data in this format, we could use a logistic regression:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">glm_fit</span> <span class="o">&lt;-</span> <span class="nf">glm</span><span class="p">(</span> <span class="n">Admit</span> <span class="o">~</span> <span class="n">Gender</span> <span class="o">+</span> <span class="n">Dept</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ucb</span><span class="p">,</span> <span class="n">weights</span> <span class="o">=</span> <span class="n">n</span><span class="p">,</span> <span class="n">family</span> <span class="o">=</span> <span class="s">&#34;binomial&#34;</span> <span class="p">)</span> <span class="n">glm_fit</span> </code></pre></div><pre><code>## ## Call: glm(formula = Admit ~ Gender + Dept, family = &quot;binomial&quot;, data = ucb, ## weights = n) ## ## Coefficients: ## (Intercept) GenderMale DeptB DeptC DeptD DeptE ## -0.68192 0.09987 0.04340 1.26260 1.29461 1.73931 ## DeptF ## 3.30648 ## ## Degrees of Freedom: 23 Total (i.e. Null); 17 Residual ## Null Deviance: 6044 ## Residual Deviance: 5187 AIC: 5201 </code></pre><p><em>This is not quite right though</em>. There are 12 combinations of <code>Gender</code> and <code>Dept</code>. How can the model have 23 total degrees of freedom?</p> <p>If we are treating our data as binomial, the traditional method for fitting this model is to convert the data to a format with columns for the number of events and non-events (per covariate pattern). Let&rsquo;s convert our data into that format:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">ucb_grouped_data</span> <span class="o">&lt;-</span> <span class="n">ucb</span> <span class="o">%&gt;%</span> <span class="nf">pivot_wider</span><span class="p">(</span> <span class="n">id_cols</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="n">Gender</span><span class="p">,</span> <span class="n">Dept</span><span class="p">),</span> <span class="n">names_from</span> <span class="o">=</span> <span class="n">Admit</span><span class="p">,</span> <span class="n">values_from</span> <span class="o">=</span> <span class="n">n</span><span class="p">,</span> <span class="n">values_fill</span> <span class="o">=</span> <span class="m">0L</span> <span class="p">)</span> <span class="n">ucb_grouped_data</span> </code></pre></div><pre><code>## # A tibble: 12 × 4 ## Gender Dept Admitted Rejected ## &lt;fct&gt; &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt; ## 1 Male A 512 313 ## 2 Female A 89 19 ## 3 Male B 353 207 ## 4 Female B 17 8 ## 5 Male C 120 205 ## 6 Female C 202 391 ## 7 Male D 138 279 ## 8 Female D 131 244 ## 9 Male E 53 138 ## 10 Female E 94 299 ## 11 Male F 22 351 ## 12 Female F 24 317 </code></pre><p>Now, since there are really only 12 covariate combinations, the appropriate model can be created.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">glm</span><span class="p">(</span> <span class="nf">cbind</span><span class="p">(</span><span class="n">Rejected</span><span class="p">,</span> <span class="n">Admitted</span><span class="p">)</span> <span class="o">~</span> <span class="n">Gender</span> <span class="o">+</span> <span class="n">Dept</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">ucb_grouped_data</span><span class="p">,</span> <span class="n">family</span> <span class="o">=</span> <span class="n">binomial</span> <span class="p">)</span> </code></pre></div><pre><code>## ## Call: glm(formula = cbind(Rejected, Admitted) ~ Gender + Dept, family = binomial, ## data = ucb_grouped_data) ## ## Coefficients: ## (Intercept) GenderMale DeptB DeptC DeptD DeptE ## -0.68192 0.09987 0.04340 1.26260 1.29461 1.73931 ## DeptF ## 3.30648 ## ## Degrees of Freedom: 11 Total (i.e. Null); 5 Residual ## Null Deviance: 877.1 ## Residual Deviance: 20.2 AIC: 103.1 </code></pre><p>In both cases the model coefficients are the same but the standard errors and degrees of freedom are only correct for the model with grouped data.</p> <h2 id="why-is-this-so-complicated">Why is this so complicated? <a href="#why-is-this-so-complicated"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Traditionally, weights in base R functions are used to fit the model and to report a few measures of model efficacy. Here, <code>glm()</code> reports the deviance while <code>lm()</code> shows estimates of the RMSE and adjusted-R<sup>2</sup>.</p> <p>Believe it or not, the logistic regression code shown above, which is a typical example of using weights in a classical statistical setting, is much simpler than what we have to contend with in modern data analysis. There are a few things that we do in modern data analysis where correctly using weights is not so straightforward. These include:</p> <ul> <li>Resampling (e.g. bootstrap or cross-validation).</li> <li>Preprocessing methods such as centering and scaling.</li> <li>Additional measures of performance (e.g. area under the ROC curve, mean absolute deviations, Kohen&rsquo;s Kappa, and so on).</li> </ul> <p>A framework like tidymodels should enable users to utilize case weights across all phases of their data analysis.</p> <p>Additionally, the type of case weights <strong>and their intent</strong> affect which of these operations should be affected.</p> <p>For example, frequency weights should affect the estimation of the model, the preprocessing steps, and performance estimation. If the predictors require centering, a weighted mean should be used to appropriately ensure that the mean of that column is truly zero. Let&rsquo;s say that sensitivity and specificity estimates are required. The 2x2 table of observed and predicted results should have cell counts that reflect the case weights. If they did not, infrequently occurring data points have as much weight as the rows that have a high prevalence.</p> <p>As a counter example, importance weights reflect the idea that they should only influence <em>the model fitting procedure</em>. It wouldn&rsquo;t make sense to use a weighted mean to center a predictor; the weight shouldn&rsquo;t influence an unsupervised operation in the same way as model estimation. More critically, any holdout data set used to quantify model efficacy should reflect the data as seen in the wild (without the impact of the weights).</p> <h2 id="how-does-tidymodels-handle-weights">How does tidymodels handle weights? <a href="#how-does-tidymodels-handle-weights"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve decided to add some additional vector data types that allow users to describe the type of weights. These data types also help tidymodels functions know what the intent of the analysis should be.</p> <p>In parsnip, the functions <code>frequency_weights()</code> and <code>importance_weights()</code> can be used to set the weights:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># For the UC admissions data</span> <span class="n">ucb</span> <span class="o">&lt;-</span> <span class="n">ucb</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">n</span> <span class="o">=</span> <span class="nf">frequency_weights</span><span class="p">(</span><span class="n">n</span><span class="p">))</span> <span class="n">ucb</span><span class="o">$</span><span class="n">n</span> </code></pre></div><pre><code>## &lt;frequency_weights[24]&gt; ## [1] 512 313 89 19 353 207 17 8 120 205 202 391 138 279 131 244 53 138 94 ## [20] 299 22 351 24 317 </code></pre><div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="c1"># For a general vector of importance weights: </span> <span class="nf">importance_weights</span><span class="p">(</span><span class="nf">round</span><span class="p">(</span><span class="nf">runif</span><span class="p">(</span><span class="m">10</span><span class="p">),</span> <span class="m">2</span><span class="p">))</span> </code></pre></div><pre><code>## &lt;importance_weights[10]&gt; ## [1] 0.91 0.53 0.72 0.81 0.33 0.11 0.61 0.61 0.20 0.49 </code></pre><p>The class of these objects tells packages like recipes and yardstick if their values should be used for preprocessing and performance metrics, respectively:</p> <ul> <li> <p>Importance weights only affect the model estimation and <em>supervised</em> recipes steps. They are not used with yardstick functions for calculating measures of model performance.</p> </li> <li> <p>Frequency weights are used for all parts of the preprocessing, model fitting, and performance estimation operations.</p> </li> </ul> <p>Currently, these are the only classes implemented. We are doing a lot of reading on how the analysis of survey data should use case weights and how we can enable this and other data analysis use cases. <a href="https://community.rstudio.com/t/case-weight-blog-post-discussion/136281" target="_blank" rel="noopener">We&rsquo;d love to hear from you</a> if you have expertise in this area.</p> <h2 id="about-resampling">About resampling <a href="#about-resampling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This is a topic that we are still unsure about. We definitively think that importance weights should not affect how the data are split or resampled.</p> <p>Frequency weights are more complex. Suppose we are using 10-fold cross-validation with the logistic regression on the UCB admission data, should we:</p> <ul> <li>Have all the case weights be placed into either the modeling or holdout set?</li> <li>Fractionally, split the case weights into both the modeling and holdout data?</li> </ul> <p>For the latter case, suppose a row of data has a case weight of 100 and we use 10-fold cross-validation. We would always put 90 of those 100 into the modeling data set and the other 10 into the holdout. This seems to be consistent with the sampling of the data and is what would happen if there were actually 100 rows in the data (instead of one row with a case weight of 100). However, it does raise questions regarding data leakage by just re-predicting the same data that went into the model.</p> <p>This is also an area where we&rsquo;d like <a href="https://community.rstudio.com/t/case-weight-blog-post-discussion/136281" target="_blank" rel="noopener">community feedback</a>.</p> <h2 id="tidymodels-syntax">Tidymodels syntax <a href="#tidymodels-syntax"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Let&rsquo;s work through an example. We&rsquo;ll use some data simulated with a severe class imbalance. These functions are in the <a href="https://modeldata.tidymodels.org/dev/reference/sim_classification.html" target="_blank" rel="noopener">development version of the modeldata package</a>.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span> <span class="n">training_sim</span> <span class="o">&lt;-</span> <span class="nf">sim_classification</span><span class="p">(</span><span class="m">5000</span><span class="p">,</span> <span class="n">intercept</span> <span class="o">=</span> <span class="m">-25</span><span class="p">)</span> <span class="n">training_sim</span> <span class="o">%&gt;%</span> <span class="nf">count</span><span class="p">(</span><span class="n">class</span><span class="p">)</span> </code></pre></div><pre><code>## # A tibble: 2 × 2 ## class n ## &lt;fct&gt; &lt;int&gt; ## 1 class_1 80 ## 2 class_2 4920 </code></pre><p>If we would like to encourage models to more accurately predict the minority class, we can give these samples a much larger weight in the analysis</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">training_sim</span> <span class="o">&lt;-</span> <span class="n">training_sim</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span> <span class="n">case_wts</span> <span class="o">=</span> <span class="nf">ifelse</span><span class="p">(</span><span class="n">class</span> <span class="o">==</span> <span class="s">&#34;class_1&#34;</span><span class="p">,</span> <span class="m">60</span><span class="p">,</span> <span class="m">1</span><span class="p">),</span> <span class="n">case_wts</span> <span class="o">=</span> <span class="nf">importance_weights</span><span class="p">(</span><span class="n">case_wts</span><span class="p">)</span> <span class="p">)</span> </code></pre></div><p>We strongly advise that users set the case weight column before any other tidymodels functions are used. This ensures that they are handled correctly in the analyses that follow. In some cases, such as recipes, we prohibit changing the case weight column. Since the intent of the weights is needed, errors could occur if that intent was changed during the analysis.</p> <p>Let&rsquo;s use 10-fold cross-validation to resample the data. This case is unaffected by the presence of weights:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">set.seed</span><span class="p">(</span><span class="m">2</span><span class="p">)</span> <span class="n">sim_folds</span> <span class="o">&lt;-</span> <span class="nf">vfold_cv</span><span class="p">(</span><span class="n">training_sim</span><span class="p">,</span> <span class="n">strata</span> <span class="o">=</span> <span class="n">class</span><span class="p">)</span> </code></pre></div><p>We&rsquo;ll fit a regularized logistic regression model to the data using glmnet:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">lr_spec</span> <span class="o">&lt;-</span> <span class="nf">logistic_reg</span><span class="p">(</span><span class="n">penalty</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">(),</span> <span class="n">mixture</span> <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;glmnet&#34;</span><span class="p">)</span> </code></pre></div><p>For this model, we need to ensure that the predictors are in the same units. We&rsquo;ll use a recipe to center and scale the data and also add some spline terms for predictors that appear to have a nonlinear relationship with the outcome:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">sim_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">training_sim</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_ns</span><span class="p">(</span><span class="nf">starts_with</span><span class="p">(</span><span class="s">&#34;non_linear&#34;</span><span class="p">),</span> <span class="n">deg_free</span> <span class="o">=</span> <span class="m">10</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_normalize</span><span class="p">(</span><span class="nf">all_numeric_predictors</span><span class="p">())</span> <span class="n">sim_rec</span> </code></pre></div><pre><code>## Recipe ## ## Inputs: ## ## role #variables ## case_weights 1 ## outcome 1 ## predictor 15 ## ## Operations: ## ## Natural splines on starts_with(&quot;non_linear&quot;) ## Centering and scaling for all_numeric_predictors() </code></pre><p>There are a few things to point out here. The recipe automatically detects the case weights even though they are captured by the dot in the right-hand side of the formula. The recipe automatically sets their role and will error if that column is changed in any way.</p> <p>As mentioned above, any unsupervised steps are unaffected by importance weights so neither <code>step_ns()</code> or <code>step_normalize()</code> use the weights in their calculations.</p> <p>When using case weights, we would like to encourage users to keep their model and preprocessing tool within a workflow. The workflows package now has an <code>add_case_weights()</code> function to help here:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">lr_wflow</span> <span class="o">&lt;-</span> <span class="nf">workflow</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">add_model</span><span class="p">(</span><span class="n">lr_spec</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">add_recipe</span><span class="p">(</span><span class="n">sim_rec</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">add_case_weights</span><span class="p">(</span><span class="n">case_wts</span><span class="p">)</span> <span class="n">lr_wflow</span> </code></pre></div><pre><code>## ══ Workflow ═════════════════════════════════════════════════════════════════════════ ## Preprocessor: Recipe ## Model: logistic_reg() ## ## ── Preprocessor ───────────────────────────────────────────────────────────────────── ## 2 Recipe Steps ## ## • step_ns() ## • step_normalize() ## ## ── Case Weights ───────────────────────────────────────────────────────────────────── ## case_wts ## ## ── Model ──────────────────────────────────────────────────────────────────────────── ## Logistic Regression Model Specification (classification) ## ## Main Arguments: ## penalty = tune() ## mixture = 1 ## ## Computational engine: glmnet </code></pre><p>Existing <code>add_*()</code> functions in workflows add objects (instead of data). Rather than specifying case weights in each preprocessor function (e.g. <code>add_formula()</code> and so on), this syntax is more simple and works with any type of preprocessor.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">cls_metrics</span> <span class="o">&lt;-</span> <span class="nf">metric_set</span><span class="p">(</span><span class="n">sensitivity</span><span class="p">,</span> <span class="n">specificity</span><span class="p">)</span> <span class="n">grid</span> <span class="o">&lt;-</span> <span class="nf">tibble</span><span class="p">(</span><span class="n">penalty</span> <span class="o">=</span> <span class="m">10</span><span class="nf">^seq</span><span class="p">(</span><span class="m">-3</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="n">length.out</span> <span class="o">=</span> <span class="m">20</span><span class="p">))</span> <span class="nf">set.seed</span><span class="p">(</span><span class="m">3</span><span class="p">)</span> <span class="n">lr_res</span> <span class="o">&lt;-</span> <span class="n">lr_wflow</span> <span class="o">%&gt;%</span> <span class="nf">tune_grid</span><span class="p">(</span><span class="n">resamples</span> <span class="o">=</span> <span class="n">sim_folds</span><span class="p">,</span> <span class="n">grid</span> <span class="o">=</span> <span class="n">grid</span><span class="p">,</span> <span class="n">metrics</span> <span class="o">=</span> <span class="n">cls_metrics</span><span class="p">)</span> <span class="nf">autoplot</span><span class="p">(</span><span class="n">lr_res</span><span class="p">)</span> </code></pre></div><p><img src="figure/sim-tune-1.svg" title="plot of chunk sim-tune" alt="plot of chunk sim-tune" width="100%" /></p> <p>In tidymodels, the default is that the first level of the outcome factor is the event of interest. Since the first level of the outcome has the fewest values, we would expect the sensitivity of the model to suffer. These results suggest that the weights are making the model focus on the majority class.</p> <p>For comparison, let&rsquo;s remove the weights and then tune the same parameter values.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">lr_unwt_wflow</span> <span class="o">&lt;-</span> <span class="n">lr_wflow</span> <span class="o">%&gt;%</span> <span class="nf">remove_case_weights</span><span class="p">()</span> <span class="nf">set.seed</span><span class="p">(</span><span class="m">3</span><span class="p">)</span> <span class="n">lr_unwt_res</span> <span class="o">&lt;-</span> <span class="n">lr_unwt_wflow</span> <span class="o">%&gt;%</span> <span class="nf">tune_grid</span><span class="p">(</span><span class="n">resamples</span> <span class="o">=</span> <span class="n">sim_folds</span><span class="p">,</span> <span class="n">grid</span> <span class="o">=</span> <span class="n">grid</span><span class="p">,</span> <span class="n">metrics</span> <span class="o">=</span> <span class="n">cls_metrics</span><span class="p">)</span> </code></pre></div><p>How do the results compare?</p> <p><img src="figure/plot-results-1.svg" title="plot of chunk plot-results" alt="plot of chunk plot-results" width="100%" /></p> <p>The importance weights certainly did their job since the weighted analysis has a better balance of sensitivity and specificity.</p> <h2 id="getting-feedback">Getting feedback <a href="#getting-feedback"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;ve laid the groundwork for using case weights holistically in tidymodels. For those of you who use case weights, we&rsquo;d like to know what you think of our approach and answer any questions that you have. We have an <a href="https://community.rstudio.com/t/case-weight-blog-post-discussion/136281" target="_blank" rel="noopener">RStudio Community post</a> queued up to discuss this topic.</p> <p>We&rsquo;ve waited to release packages with case weight support until the main pieces were in place. If you would like to play around with what we&rsquo;ve done, you can load the development versions of the packages using:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">if </span><span class="p">(</span><span class="o">!</span><span class="n">rlang</span><span class="o">::</span><span class="nf">is_installed</span><span class="p">(</span><span class="s">&#34;pak&#34;</span><span class="p">))</span> <span class="p">{</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;pak&#34;</span><span class="p">)</span> <span class="p">}</span> <span class="n">pkgs</span> <span class="o">&lt;-</span> <span class="nf">c</span><span class="p">(</span><span class="s">&#34;hardhat&#34;</span><span class="p">,</span> <span class="s">&#34;parsnip&#34;</span><span class="p">,</span> <span class="s">&#34;recipes&#34;</span><span class="p">,</span> <span class="s">&#34;modeldata&#34;</span><span class="p">,</span> <span class="s">&#34;tune&#34;</span><span class="p">,</span> <span class="s">&#34;workflows&#34;</span><span class="p">,</span> <span class="s">&#34;yardstick&#34;</span><span class="p">)</span> <span class="n">pkgs</span> <span class="o">&lt;-</span> <span class="nf">paste0</span><span class="p">(</span><span class="s">&#34;tidymodels/&#34;</span><span class="p">,</span> <span class="n">pkgs</span><span class="p">)</span> <span class="n">pak</span><span class="o">::</span><span class="nf">pak</span><span class="p">(</span><span class="n">pkgs</span><span class="p">)</span> </code></pre></div><p>If you use any of the parsnip extension packages (e.g. discrim, rules, etc), make sure to install the development versions of these too.</p> Updates for recipes extension packages https://www.tidyverse.org/blog/2022/05/recipes-update-05-20222/ Tue, 03 May 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/05/recipes-update-05-20222/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re tickled pink to announce the releases of extension packages that followed the recent release of <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes</a> 0.2.0. recipes is a package for preprocessing data before using it in models or visualizations. You can think of it as a mash-up of <a href="https://rdrr.io/r/stats/model.matrix.html" target="_blank" rel="noopener"><code>model.matrix()</code></a> and dplyr.</p> <p>You can install the these updates from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"embed"</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"themis"</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"textrecipes"</span><span class='o'>)</span></code></pre> </div> <p>The <code>NEWS</code> files are linked here for each package; We will go over some of the bigger changes within and between these packages in this post. A lot of the smaller changes were done to make sure that these extension packages are up to the same standard as recipes itself.</p> <ul> <li> <a href="https://themis.tidymodels.org/news/index.html#themis-020" target="_blank" rel="noopener">themis</a></li> <li> <a href="https://textrecipes.tidymodels.org/news/index.html#textrecipes-051" target="_blank" rel="noopener">textrecipes</a></li> <li> <a href="https://embed.tidymodels.org/news/index.html#embed-020" target="_blank" rel="noopener">embed</a></li> </ul> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/recipes'>recipes</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/themis'>themis</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://github.com/tidymodels/textrecipes'>textrecipes</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://embed.tidymodels.org'>embed</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://modeldata.tidymodels.org'>modeldata</a></span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/base/Random.html'>set.seed</a></span><span class='o'>(</span><span class='m'>1234</span><span class='o'>)</span></code></pre> </div> <h2 id="themis">themis <a href="#themis"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A new step <a href="https://themis.tidymodels.org/reference/step_smotenc.html" target="_blank" rel="noopener"><code>step_smotenc()</code></a> was added thanks to <a href="https://github.com/RobertGregg" target="_blank" rel="noopener">Robert Gregg</a>. This step applies the <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C7&amp;q=SMOTENC&#43;&amp;btnG=" target="_blank" rel="noopener">SMOTENC algorithm</a> to synthetically generate observations from minority classes. The SMOTENC method can handle a mix of categorical and numerical predictors, which was not possible using the existing SMOTE method which could only operate on numeric predictors.</p> <p>The <code>hpc_data</code> illustrates this use case neatly. The data set contains characteristics of HPC Unix jobs and how long they took to run (the outcome column is <code>class</code>). The outcome is not that balanced, with some classes having almost 10 times fewer observations than others. One way to deal with an imbalance like this is to over-sample the minority observations to mitigate the imbalance.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>hpc_data</span><span class='o'>)</span> <span class='nv'>hpc_data</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/count.html'>count</a></span><span class='o'>(</span><span class='nv'>class</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span> <span class='c'>#&gt; class n</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> VF <span style='text-decoration: underline;'>2</span>211</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> F <span style='text-decoration: underline;'>1</span>347</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> M 514</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> L 259</span></code></pre> </div> <p>Using <a href="https://themis.tidymodels.org/reference/step_smotenc.html" target="_blank" rel="noopener"><code>step_smotenc()</code></a>, with the <code>over_ratio</code> argument, we can make sure that all classes are over-sampled to have no less than half of the observations of the largest class.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>up_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>class</span> <span class='o'>~</span> <span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>hpc_data</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://themis.tidymodels.org/reference/step_smotenc.html'>step_smotenc</a></span><span class='o'>(</span><span class='nv'>class</span>, over_ratio <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/prep.html'>prep</a></span><span class='o'>(</span><span class='o'>)</span> <span class='nv'>up_rec</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/bake.html'>bake</a></span><span class='o'>(</span>new_data <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/count.html'>count</a></span><span class='o'>(</span><span class='nv'>class</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span> <span class='c'>#&gt; class n</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> VF <span style='text-decoration: underline;'>2</span>211</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> F <span style='text-decoration: underline;'>1</span>347</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> M <span style='text-decoration: underline;'>1</span>105</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> L <span style='text-decoration: underline;'>1</span>105</span></code></pre> </div> <p>The method that was implemented in embed now has <a href="https://themis.tidymodels.org/reference/index.html#methods" target="_blank" rel="noopener">standalone functions</a> to apply these algorithms without having to create a recipe.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://themis.tidymodels.org/reference/smotenc.html'>smotenc</a></span><span class='o'>(</span><span class='nv'>hpc_data</span>, <span class='s'>"class"</span>, over_ratio <span class='o'>=</span> <span class='m'>0.5</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5,768 × 8</span></span> <span class='c'>#&gt; protocol compounds input_fields iterations num_pending hour day class</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'> 1</span> E 997 137 20 0 14 Tue F </span> <span class='c'>#&gt; <span style='color: #555555;'> 2</span> E 97 103 20 0 13.8 Tue VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 3</span> E 101 75 10 0 13.8 Thu VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 4</span> E 93 76 20 0 10.1 Fri VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 5</span> E 100 82 20 0 10.4 Fri VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 6</span> E 100 82 20 0 16.5 Wed VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 7</span> E 105 88 20 0 16.4 Fri VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 8</span> E 98 95 20 0 16.7 Fri VF </span> <span class='c'>#&gt; <span style='color: #555555;'> 9</span> E 101 91 20 0 16.2 Fri VF </span> <span class='c'>#&gt; <span style='color: #555555;'>10</span> E 95 92 20 0 10.8 Wed VF </span> <span class='c'>#&gt; <span style='color: #555555;'># … with 5,758 more rows</span></span></code></pre> </div> <h2 id="textrecipes">textrecipes <a href="#textrecipes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We added the functions <a href="https://textrecipes.tidymodels.org/reference/all_tokenized.html" target="_blank" rel="noopener"><code>all_tokenized()</code></a> and <a href="https://textrecipes.tidymodels.org/reference/all_tokenized.html" target="_blank" rel="noopener"><code>all_tokenized_predictors()</code></a> to more easily select tokenized columns, similar to the <a href="https://recipes.tidymodels.org/reference/has_role.html" target="_blank" rel="noopener">existing <code>all_numeric()</code> and <code>all_numeric_predictors()</code> selectors in recipes</a>.</p> <p>The most important step in textrecipes is <a href="https://textrecipes.tidymodels.org/reference/step_tokenize.html" target="_blank" rel="noopener"><code>step_tokenize()</code></a>, as you need it to generate tokens that can be modified by other steps. We have found that this function has gotten overloaded with functionality as more and more support for different types of tokenization was added. To address this, we have created new specialized tokenization steps; <a href="https://textrecipes.tidymodels.org/reference/step_tokenize.html" target="_blank" rel="noopener"><code>step_tokenize()</code></a> has gotten cousin steps <a href="https://textrecipes.tidymodels.org/reference/step_tokenize_bpe.html" target="_blank" rel="noopener"><code>step_tokenize_bpe()</code></a>, <a href="https://textrecipes.tidymodels.org/reference/step_tokenize_sentencepiece.html" target="_blank" rel="noopener"><code>step_tokenize_sentencepiece()</code></a>, and <a href="https://textrecipes.tidymodels.org/reference/step_tokenize_wordpiece.html" target="_blank" rel="noopener"><code>step_tokenize_wordpiece()</code></a> which wrap <a href="https://CRAN.R-project.org/package=tokenizers.bpe" target="_blank" rel="noopener">tokenizers.bpe</a>, <a href="https://CRAN.R-project.org/package=sentencepiece" target="_blank" rel="noopener">sentencepiece</a>, and <a href="https://CRAN.R-project.org/package=wordpiece" target="_blank" rel="noopener">wordpiece</a> respectively.</p> <p>In addition to being easier to manage code-wise, these new functions also allow for more compact, more readable code with better tab completion.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>tate_text</span><span class='o'>)</span> <span class='c'># Old</span> <span class='nv'>tate_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://textrecipes.tidymodels.org/reference/step_tokenize.html'>step_tokenize</a></span><span class='o'>(</span> <span class='nv'>text</span>, engine <span class='o'>=</span> <span class='s'>"tokenizers.bpe"</span>, training_options <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>vocab_size <span class='o'>=</span> <span class='m'>1000</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'># New</span> <span class='nv'>tate_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='o'>~</span><span class='nv'>.</span>, data <span class='o'>=</span> <span class='nv'>tate_text</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://textrecipes.tidymodels.org/reference/step_tokenize_bpe.html'>step_tokenize_bpe</a></span><span class='o'>(</span><span class='nv'>medium</span>, vocabulary_size <span class='o'>=</span> <span class='m'>1000</span><span class='o'>)</span></code></pre> </div> <h2 id="embed">embed <a href="#embed"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://embed.tidymodels.org/reference/step_feature_hash.html" target="_blank" rel="noopener"><code>step_feature_hash()</code></a> is now soft deprecated in embed in favor of <a href="https://textrecipes.tidymodels.org/reference/step_dummy_hash.html" target="_blank" rel="noopener"><code>step_dummy_hash()</code></a> in textrecipes. The embed version uses TensorFlow, which for some use cases is quite a dependency. One thing to keep an eye out for when moving over is that the textrecipes version uses <code>num_terms</code> instead of <code>num_hash</code> to denote the number of columns to output.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/data.html'>data</a></span><span class='o'>(</span><span class='nv'>Sacramento</span><span class='o'>)</span> <span class='c'># Old recipe</span> <span class='nv'>embed_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>price</span> <span class='o'>~</span> <span class='nv'>zip</span>, data <span class='o'>=</span> <span class='nv'>Sacramento</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://embed.tidymodels.org/reference/step_feature_hash.html'>step_feature_hash</a></span><span class='o'>(</span><span class='nv'>zip</span>, num_hash <span class='o'>=</span> <span class='m'>64</span><span class='o'>)</span> <span class='c'>#&gt; Loaded Tensorflow version 2.8.0</span> <span class='c'># New recipe</span> <span class='nv'>textrecipes_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://recipes.tidymodels.org/reference/recipe.html'>recipe</a></span><span class='o'>(</span><span class='nv'>price</span> <span class='o'>~</span> <span class='nv'>zip</span>, data <span class='o'>=</span> <span class='nv'>Sacramento</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://textrecipes.tidymodels.org/reference/step_dummy_hash.html'>step_dummy_hash</a></span><span class='o'>(</span><span class='nv'>zip</span>, num_terms <span class='o'>=</span> <span class='m'>64</span><span class='o'>)</span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to extend our thanks to all of the contributors who helped make these releases possible!</p> <ul> <li> <p>themis: <a href="https://github.com/coforfe" target="_blank" rel="noopener">@coforfe</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/emilyriederer" target="_blank" rel="noopener">@emilyriederer</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/OGuggenbuehl" target="_blank" rel="noopener">@OGuggenbuehl</a>, and <a href="https://github.com/RobertGregg" target="_blank" rel="noopener">@RobertGregg</a>.</p> </li> <li> <p>textrecipes: <a href="https://github.com/dgrtwo" target="_blank" rel="noopener">@dgrtwo</a>, <a href="https://github.com/DiabbZegpi" target="_blank" rel="noopener">@DiabbZegpi</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jcragy" target="_blank" rel="noopener">@jcragy</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/NLDataScientist" target="_blank" rel="noopener">@NLDataScientist</a>, <a href="https://github.com/raj-hubber" target="_blank" rel="noopener">@raj-hubber</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/naveranoc" target="_blank" rel="noopener">@naveranoc</a>, <a href="https://github.com/talegari" target="_blank" rel="noopener">@talegari</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> </ul> haven 2.5.0 https://www.tidyverse.org/blog/2022/04/haven-2-5-0/ Fri, 15 Apr 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/04/haven-2-5-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re chuffed to announce the release of <a href="https://haven.tidyverse.org" target="_blank" rel="noopener">haven</a> 2.5.0. haven allows you to read and write SAS, SPSS, and Stata data formats from R, thanks to the wonderful <a href="https://github.com/WizardMac/ReadStat" target="_blank" rel="noopener">ReadStat</a> C library written by <a href="https://www.evanmiller.org/" target="_blank" rel="noopener">Evan Miller</a>.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"haven"</span><span class='o'>)</span></code></pre> </div> <p>The most important news for this release is that <a href="https://github.com/gorcha" target="_blank" rel="noopener">Danny Smith</a> is now a haven author in recognition of his significant and sustained contributions. He contributed the majority of improvements and bug fixes to this release.</p> <p>Other improvements of note:</p> <ul> <li> <p>You can set custom variable widths when writing by setting the <code>width</code> attribute of the variable.</p> </li> <li> <p>You can create FDA-compliant SAS transport files with haven, thanks to the addition of custom variable width support and some XPT writing related bug fixes.</p> </li> <li> <p><code>write_dta()</code> now supports Stata&rsquo;s <code>StrL</code> variables. This means that it&rsquo;s possible to write Stata files containing strings longer than 2045 characters, which was previously a hard upper limit.</p> </li> </ul> <p>You can see a full list of changes in the <a href="https://github.com/tidyverse/haven/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks to all 24 folks who contributed to this released by filing issues or creating pull requests: <a href="https://github.com/aito123" target="_blank" rel="noopener">@aito123</a>, <a href="https://github.com/arnaud-feldmann" target="_blank" rel="noopener">@arnaud-feldmann</a>, <a href="https://github.com/brianstamper" target="_blank" rel="noopener">@brianstamper</a>, <a href="https://github.com/dusadrian" target="_blank" rel="noopener">@dusadrian</a>, <a href="https://github.com/elimillera" target="_blank" rel="noopener">@elimillera</a>, <a href="https://github.com/etiennebacher" target="_blank" rel="noopener">@etiennebacher</a>, <a href="https://github.com/geebioso" target="_blank" rel="noopener">@geebioso</a>, <a href="https://github.com/gorcha" target="_blank" rel="noopener">@gorcha</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/jakoberr" target="_blank" rel="noopener">@jakoberr</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/juansebastianl" target="_blank" rel="noopener">@juansebastianl</a>, <a href="https://github.com/khanhhtt" target="_blank" rel="noopener">@khanhhtt</a>, <a href="https://github.com/Luke791" target="_blank" rel="noopener">@Luke791</a>, <a href="https://github.com/manhnguyen48" target="_blank" rel="noopener">@manhnguyen48</a>, <a href="https://github.com/maxecharel" target="_blank" rel="noopener">@maxecharel</a>, <a href="https://github.com/MokeEire" target="_blank" rel="noopener">@MokeEire</a>, <a href="https://github.com/Nate884" target="_blank" rel="noopener">@Nate884</a>, <a href="https://github.com/pskoulgi" target="_blank" rel="noopener">@pskoulgi</a>, <a href="https://github.com/Sama2than" target="_blank" rel="noopener">@Sama2than</a>, <a href="https://github.com/Shaunson26" target="_blank" rel="noopener">@Shaunson26</a>, <a href="https://github.com/sjkiss" target="_blank" rel="noopener">@sjkiss</a>, <a href="https://github.com/szimmer" target="_blank" rel="noopener">@szimmer</a>, and <a href="https://github.com/yangwenghou123" target="_blank" rel="noopener">@yangwenghou123</a>.</p> scales 1.2.0 https://www.tidyverse.org/blog/2022/04/scales-1-2-0/ Wed, 13 Apr 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/04/scales-1-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re very pleased to announce the release of <a href="https://scales.r-lib.org" target="_blank" rel="noopener">scales</a> 1.2.0. The scales package provides much of the infrastructure that underlies ggplot2&rsquo;s scales, and using it allow you to customize the transformations, breaks, and labels used by ggplot2. You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"scales"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will show off a few new features for labeling numbers, log scales, and currencies. You can see a full list of changes in the <a href="https://github.com/r-lib/scales/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://ggplot2.tidyverse.org'>ggplot2</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://scales.r-lib.org'>scales</a></span><span class='o'>)</span></code></pre> </div> <h2 id="numbers">Numbers <a href="#numbers"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://scales.r-lib.org/reference/label_number.html" target="_blank" rel="noopener"><code>label_number()</code></a> is the workhorse that powers ggplot2&rsquo;s formatting of numbers, including <a href="https://scales.r-lib.org/reference/label_dollar.html" target="_blank" rel="noopener"><code>label_dollar()</code></a> and <a href="https://scales.r-lib.org/reference/label_number.html" target="_blank" rel="noopener"><code>label_comma()</code></a>. This release added a number of useful new features.</p> <p>The most important is a new <code>scale_cut</code> argument that makes it possible to independently scales different parts of the range. This is useful for scales which span multiple orders of magnitude. Take the following two examples which don&rsquo;t get great labels by default:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='m'>10</span> <span class='o'>^</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>1000</span>, <span class='m'>2</span>, <span class='m'>9</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>1000</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>df2</span> <span class='o'>&lt;-</span> <span class='nv'>df1</span> |&gt; <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>x</span> <span class='o'>&lt;=</span> <span class='m'>1.25</span> <span class='o'>*</span> <span class='m'>10</span><span class='o'>^</span><span class='m'>6</span><span class='o'>)</span> <span class='nv'>plot1</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df1</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='kc'>NULL</span>, y <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='nv'>plot1</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_log10</a></span><span class='o'>(</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" title="Scatterplot with x-axis labels 1e+03, 1e+05, 1e+07, and 1e+09." alt="Scatterplot with x-axis labels 1e+03, 1e+05, 1e+07, and 1e+09." width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot2</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df2</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_point.html'>geom_point</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='kc'>NULL</span>, y <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='nv'>plot2</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" title="Scatterplot with x-axis labels 0, 250000, 500000, 750000, 1000000, 12500000." alt="Scatterplot with x-axis labels 0, 250000, 500000, 750000, 1000000, 12500000." width="700px" style="display: block; margin: auto;" /></p> </div> <p>You can use <a href="https://scales.r-lib.org/reference/number.html" target="_blank" rel="noopener"><code>cut_short_scale()</code></a> to show thousands with a K suffix, millions with a M suffix, and billions with a B suffix:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot1</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_log10</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_short_scale</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" title="Scatterplot with x-axis labels 1K, 100K, 10M, 1B." alt="Scatterplot with x-axis labels 1K, 100K, 10M, 1B." width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot2</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_short_scale</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" title="Scatterplot with x-axis labels 0, 250K, 500K, 750K, 1.00M, 1.25M" alt="Scatterplot with x-axis labels 0, 250K, 500K, 750K, 1.00M, 1.25M" width="700px" style="display: block; margin: auto;" /></p> </div> <p>(If your country uses 1 billion to mean 1 million million, then you can use <a href="https://scales.r-lib.org/reference/number.html" target="_blank" rel="noopener"><code>cut_long_scale()</code></a> instead of <a href="https://scales.r-lib.org/reference/number.html" target="_blank" rel="noopener"><code>cut_short_scale()</code></a>.)</p> <p>You can use <a href="https://scales.r-lib.org/reference/number.html" target="_blank" rel="noopener"><code>cut_si()</code></a> for SI labels:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot1</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_log10</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_si</a></span><span class='o'>(</span><span class='s'>"g"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" title="Scatterplot with x-axis labels 1 kg, 100 kg, 10 Mg, 1 Gg." alt="Scatterplot with x-axis labels 1 kg, 100 kg, 10 Mg, 1 Gg." width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot2</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_number.html'>label_number</a></span><span class='o'>(</span>scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_si</a></span><span class='o'>(</span><span class='s'>"Hz"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" title="Scatterplot with x-axis labels 0, 250 KMz, 500 KHz, 750 KHz, 1.00 MHz, 1.25 MHz" alt="Scatterplot with x-axis labels 0, 250 KMz, 500 KHz, 750 KHz, 1.00 MHz, 1.25 MHz" width="700px" style="display: block; margin: auto;" /></p> </div> <p>This replaces <a href="https://scales.r-lib.org/reference/label_number_si.html" target="_blank" rel="noopener"><code>label_number_si()</code></a> because it incorrectly used the <a href="https://en.wikipedia.org/wiki/Long_and_short_scales" target="_blank" rel="noopener">short-scale abbreviations</a> instead of the correct <a href="https://en.wikipedia.org/wiki/Metric_prefix" target="_blank" rel="noopener">SI prefixes</a>.</p> <h2 id="log-labels">Log labels <a href="#log-labels"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Another way to label logs scales, thanks to <a href="https://github.com/davidchall" target="_blank" rel="noopener">David C Hall</a>, you can now use <code>scales::label_log()</code> to display</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot1</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_log10</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/label_log.html'>label_log</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" title="Scatterplot with x-axis labels in mathematical notation: 10^3, 10^5, 10^7, 10^9." alt="Scatterplot with x-axis labels in mathematical notation: 10^3, 10^5, 10^7, 10^9." width="700px" style="display: block; margin: auto;" /></p> </div> <p>You can use the <code>base</code> argument if you need a different base for the a logarithm:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot1</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_x_continuous</a></span><span class='o'>(</span> trans <span class='o'>=</span> <span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/log_trans.html'>log_trans</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span>, labels <span class='o'>=</span> <span class='nf'>scales</span><span class='nf'>::</span><span class='nf'><a href='https://scales.r-lib.org/reference/label_log.html'>label_log</a></span><span class='o'>(</span><span class='m'>2</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" title="Scatterplot with x-axis labels in mathematical notation: 2^11, 2^17, 2^23, 2^29." alt="Scatterplot with x-axis labels in mathematical notation: 2^11, 2^17, 2^23, 2^29." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="currency">Currency <a href="#currency"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Finally, <a href="https://scales.r-lib.org/reference/label_dollar.html" target="_blank" rel="noopener"><code>label_dollar()</code></a> receives a couple of small improvements. The <code>prefix</code> is now placed before the negative sign, rather than after it, yielding (e.g) the correct <code>-$1</code> instead of <code>$-1</code>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span> date <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='o'>(</span><span class='s'>"2022-01-01"</span><span class='o'>)</span> <span class='o'>+</span> <span class='m'>1</span><span class='o'>:</span><span class='m'>1e3</span>, balance <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/cumsum.html'>cumsum</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>1e3</span>, <span class='o'>-</span><span class='m'>1e3</span>, <span class='m'>1e3</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>plot3</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/ggplot.html'>ggplot</a></span><span class='o'>(</span><span class='nv'>df3</span>, <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/aes.html'>aes</a></span><span class='o'>(</span><span class='nv'>date</span>, <span class='nv'>balance</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/geom_path.html'>geom_line</a></span><span class='o'>(</span><span class='o'>)</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/labs.html'>labs</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='kc'>NULL</span>, y <span class='o'>=</span> <span class='kc'>NULL</span><span class='o'>)</span> <span class='nv'>plot3</span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" title="Line with y-axis labels in mathematical notation: 0, -10000, -20000, -30000, -40000." alt="Line with y-axis labels in mathematical notation: 0, -10000, -20000, -30000, -40000." width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot3</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_dollar.html'>label_dollar</a></span><span class='o'>(</span>scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_short_scale</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-12-1.png" title="Line with y-axis labels in mathematical notation: $0, -$10K, -$20K, -$30K, -$40K." alt="Line with y-axis labels in mathematical notation: $0, -$10K, -$20K, -$30K, -$40K." width="700px" style="display: block; margin: auto;" /></p> </div> <p>It also no longer uses its own <code>negative_parens</code> argument, but instead inherits the new <code>style_negative</code> argument from <a href="https://scales.r-lib.org/reference/label_number.html" target="_blank" rel="noopener"><code>label_number()</code></a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>plot3</span> <span class='o'>+</span> <span class='nf'><a href='https://ggplot2.tidyverse.org/reference/scale_continuous.html'>scale_y_continuous</a></span><span class='o'>(</span> labels <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/label_dollar.html'>label_dollar</a></span><span class='o'>(</span> scale_cut <span class='o'>=</span> <span class='nf'><a href='https://scales.r-lib.org/reference/number.html'>cut_short_scale</a></span><span class='o'>(</span><span class='o'>)</span>, style_negative <span class='o'>=</span> <span class='s'>"parens"</span> <span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-13-1.png" title="Line with y-axis labels in mathematical notation: $0, ($10K), ($20K), ($30K), ($40K)." alt="Line with y-axis labels in mathematical notation: $0, ($10K), ($20K), ($30K), ($40K)." width="700px" style="display: block; margin: auto;" /></p> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>A big thanks goes to <a href="https://github.com/davidchall" target="_blank" rel="noopener">David C Hall</a>, who contributed to the majority of new features in this version. 40 others contributed by asking questions, identifying bugs, and suggesting patches: <a href="https://github.com/aalucaci" target="_blank" rel="noopener">@aalucaci</a>, <a href="https://github.com/adamkemberling" target="_blank" rel="noopener">@adamkemberling</a>, <a href="https://github.com/akonkel-aek" target="_blank" rel="noopener">@akonkel-aek</a>, <a href="https://github.com/billdenney" target="_blank" rel="noopener">@billdenney</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/campbead" target="_blank" rel="noopener">@campbead</a>, <a href="https://github.com/cawthm" target="_blank" rel="noopener">@cawthm</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/davidhodge931" target="_blank" rel="noopener">@davidhodge931</a>, <a href="https://github.com/davidski" target="_blank" rel="noopener">@davidski</a>, <a href="https://github.com/dkahle" target="_blank" rel="noopener">@dkahle</a>, <a href="https://github.com/donboyd5" target="_blank" rel="noopener">@donboyd5</a>, <a href="https://github.com/dpseidel" target="_blank" rel="noopener">@dpseidel</a>, <a href="https://github.com/ds-jim" target="_blank" rel="noopener">@ds-jim</a>, <a href="https://github.com/EBukin" target="_blank" rel="noopener">@EBukin</a>, <a href="https://github.com/elong0527" target="_blank" rel="noopener">@elong0527</a>, <a href="https://github.com/eutwt" target="_blank" rel="noopener">@eutwt</a>, <a href="https://github.com/ewenme" target="_blank" rel="noopener">@ewenme</a>, <a href="https://github.com/fontikar" target="_blank" rel="noopener">@fontikar</a>, <a href="https://github.com/frederikziebell" target="_blank" rel="noopener">@frederikziebell</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/IndrajeetPatil" target="_blank" rel="noopener">@IndrajeetPatil</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/karawoo" target="_blank" rel="noopener">@karawoo</a>, <a href="https://github.com/mfherman" target="_blank" rel="noopener">@mfherman</a>, <a href="https://github.com/mikmart" target="_blank" rel="noopener">@mikmart</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/mjskay" target="_blank" rel="noopener">@mjskay</a>, <a href="https://github.com/nicolaspayette" target="_blank" rel="noopener">@nicolaspayette</a>, <a href="https://github.com/NunoSempere" target="_blank" rel="noopener">@NunoSempere</a>, <a href="https://github.com/SimonDedman" target="_blank" rel="noopener">@SimonDedman</a>, <a href="https://github.com/sjackman" target="_blank" rel="noopener">@sjackman</a>, <a href="https://github.com/stragu" target="_blank" rel="noopener">@stragu</a>, <a href="https://github.com/teunbrand" target="_blank" rel="noopener">@teunbrand</a>, <a href="https://github.com/thomasp85" target="_blank" rel="noopener">@thomasp85</a>, <a href="https://github.com/TonyLadson" target="_blank" rel="noopener">@TonyLadson</a>, <a href="https://github.com/tuoheyd" target="_blank" rel="noopener">@tuoheyd</a>, <a href="https://github.com/vinhtantran" target="_blank" rel="noopener">@vinhtantran</a>, <a href="https://github.com/vsocrates" target="_blank" rel="noopener">@vsocrates</a>, and <a href="https://github.com/yutannihilation" target="_blank" rel="noopener">@yutannihilation</a>.</p> Q1 2022 tidymodels digest https://www.tidyverse.org/blog/2022/04/tidymodels-2022-q1/ Fri, 01 Apr 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/04/tidymodels-2022-q1/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] `hugodown::use_tidy_thumbnails()` * [x] Add intro sentence, e.g. the standard tagline for the package * [x] `usethis::use_tidy_thanks()` --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="c1">#&gt; ── Attaching packages ──────────────────────────── tidymodels 0.2.0 ──</span> <span class="c1">#&gt; ✓ broom 0.7.12 ✓ rsample 0.1.1 </span> <span class="c1">#&gt; ✓ dials 0.1.0 ✓ tibble 3.1.6 </span> <span class="c1">#&gt; ✓ dplyr 1.0.8 ✓ tidyr 1.2.0 </span> <span class="c1">#&gt; ✓ infer 1.0.0 ✓ tune 0.2.0 </span> <span class="c1">#&gt; ✓ modeldata 0.1.1 ✓ workflows 0.2.6 </span> <span class="c1">#&gt; ✓ parsnip 0.2.1 ✓ workflowsets 0.2.1 </span> <span class="c1">#&gt; ✓ purrr 0.3.4 ✓ yardstick 0.0.9 </span> <span class="c1">#&gt; ✓ recipes 0.2.0</span> <span class="c1">#&gt; ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──</span> <span class="c1">#&gt; x purrr::discard() masks scales::discard()</span> <span class="c1">#&gt; x dplyr::filter() masks stats::filter()</span> <span class="c1">#&gt; x dplyr::lag() masks stats::lag()</span> <span class="c1">#&gt; x recipes::step() masks stats::step()</span> <span class="c1">#&gt; • Dig deeper into tidy modeling with R at https://www.tmwr.org</span> </code></pre></div><p>Since the beginning of last year, we have been publishing <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">quarterly updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. The purpose of these regular posts is to share useful new features and any updates you may have missed. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused, like these from the past month or so:</p> <ul> <li> <a href="https://www.tidyverse.org/blog/2022/02/recipes-0-2-0/" target="_blank" rel="noopener">recipes</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/03/usemodels-0-2-0/" target="_blank" rel="noopener">usemodels</a></li> <li> <a href="https://www.tidyverse.org/blog/2022/03/parsnip-roundup-2022/" target="_blank" rel="noopener">parsnip and its extension packages</a></li> </ul> <p>Since <a href="https://www.tidyverse.org/blog/2021/12/tidymodels-2021-q4/" target="_blank" rel="noopener">our last roundup post</a>, there have been 21 CRAN releases of tidymodels packages. You can install these updates from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span> <span class="s">&#34;baguette&#34;</span><span class="p">,</span> <span class="s">&#34;broom&#34;</span><span class="p">,</span> <span class="s">&#34;brulee&#34;</span><span class="p">,</span> <span class="s">&#34;dials&#34;</span><span class="p">,</span> <span class="s">&#34;discrim&#34;</span><span class="p">,</span> <span class="s">&#34;finetune&#34;</span><span class="p">,</span> <span class="s">&#34;hardhat&#34;</span><span class="p">,</span> <span class="s">&#34;multilevelmod&#34;</span><span class="p">,</span> <span class="s">&#34;parsnip&#34;</span><span class="p">,</span> <span class="s">&#34;plsmod&#34;</span><span class="p">,</span> <span class="s">&#34;poissonreg&#34;</span><span class="p">,</span> <span class="s">&#34;recipes&#34;</span><span class="p">,</span> <span class="s">&#34;rules&#34;</span><span class="p">,</span> <span class="s">&#34;stacks&#34;</span><span class="p">,</span> <span class="s">&#34;textrecipes&#34;</span><span class="p">,</span> <span class="s">&#34;tune&#34;</span><span class="p">,</span> <span class="s">&#34;tidymodels&#34;</span><span class="p">,</span> <span class="s">&#34;usemodels&#34;</span><span class="p">,</span> <span class="s">&#34;vetiver&#34;</span><span class="p">,</span> <span class="s">&#34;workflows&#34;</span><span class="p">,</span> <span class="s">&#34;workflowsets&#34;</span> <span class="p">))</span> </code></pre></div><p>The <code>NEWS</code> files are linked here for each package; you&rsquo;ll notice that there are a lot! We know it may be bothersome to keep up with all these changes, so we want to draw your attention to our recent blog posts above and also highlight a few more useful updates in today&rsquo;s blog post.</p> <ul> <li> <a href="https://baguette.tidymodels.org/news/index.html#baguette-020" target="_blank" rel="noopener">baguette</a></li> <li> <a href="https://broom.tidymodels.org/news/index.html#broom-0711" target="_blank" rel="noopener">broom</a></li> <li> <a href="https://tidymodels.github.io/brulee/news/index.html#brulee-010" target="_blank" rel="noopener">brulee</a></li> <li> <a href="https://dials.tidymodels.org/news/index.html#dials-010" target="_blank" rel="noopener">dials</a></li> <li> <a href="https://finetune.tidymodels.org/news/index.html#finetune-020" target="_blank" rel="noopener">finetune</a></li> <li> <a href="https://hardhat.tidymodels.org/news/index.html#hardhat-020" target="_blank" rel="noopener">hardhat</a></li> <li> <a href="https://hardhat.tidymodels.org/news/index.html#hardhat-020" target="_blank" rel="noopener">hardhat</a></li> <li> <a href="https://github.com/tidymodels/multilevelmod/blob/main/NEWS.md" target="_blank" rel="noopener">multilevelmod</a></li> <li> <a href="https://parsnip.tidymodels.org/news/index.html#parsnip-021" target="_blank" rel="noopener">parsnip</a></li> <li> <a href="https://plsmod.tidymodels.org/news/index.html#plsmod-012" target="_blank" rel="noopener">plsmod</a></li> <li> <a href="https://poissonreg.tidymodels.org/news/index.html#poissonreg-020" target="_blank" rel="noopener">poissonreg</a></li> <li> <a href="https://github.com/tidymodels/recipes/blob/HEAD/NEWS.md#recipes-020" target="_blank" rel="noopener">recipes</a></li> <li> <a href="https://rules.tidymodels.org/news/index.html#rules-020" target="_blank" rel="noopener">rules</a></li> <li> <a href="https://stacks.tidymodels.org/news/index.html#stacks-022" target="_blank" rel="noopener">stacks</a></li> <li> <a href="https://textrecipes.tidymodels.org/news/index.html#textrecipes-050" target="_blank" rel="noopener">textrecipes</a></li> <li> <a href="https://tune.tidymodels.org/news/index.html#tune-020" target="_blank" rel="noopener">tune</a></li> <li>the <a href="https://tidymodels.tidymodels.org/news/index.html#tidymodels-020" target="_blank" rel="noopener">tidymodels</a> metapackage itself</li> <li> <a href="https://usemodels.tidymodels.org/news/index.html#usemodels-020" target="_blank" rel="noopener">usemodels</a></li> <li> <a href="https://vetiver.tidymodels.org/news/index.html#vetiver-012" target="_blank" rel="noopener">vetiver</a></li> <li> <a href="https://workflows.tidymodels.org/news/index.html#workflows-025" target="_blank" rel="noopener">workflows</a></li> <li> <a href="https://workflowsets.tidymodels.org/news/index.html#workflowsets-021" target="_blank" rel="noopener">workflowsets</a></li> </ul> <p>We&rsquo;re really excited about <a href="https://tidymodels.github.io/brulee/" target="_blank" rel="noopener">brulee</a> and <a href="https://vetiver.tidymodels.org/" target="_blank" rel="noopener">vetiver</a> but will share more in upcoming blog posts.</p> <h2 id="feature-hashing">Feature hashing <a href="#feature-hashing"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The newest <a href="https://textrecipes.tidymodels.org/" target="_blank" rel="noopener">textrecipes</a> release provides support for feature hashing, a feature engineering approach that can be helpful when working with high cardinality categorical data or text. A hashing function takes an input of variable size and maps it to an output of fixed size. Hashing functions are commonly used in cryptography and databases, and we can create a hash in R using <code>rlang::hash()</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">textrecipes</span><span class="p">)</span> <span class="nf">data</span><span class="p">(</span><span class="n">Sacramento</span><span class="p">)</span> <span class="nf">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span> <span class="n">sac_split</span> <span class="o">&lt;-</span> <span class="nf">initial_split</span><span class="p">(</span><span class="n">Sacramento</span><span class="p">,</span> <span class="n">strata</span> <span class="o">=</span> <span class="n">price</span><span class="p">)</span> <span class="n">sac_train</span> <span class="o">&lt;-</span> <span class="nf">training</span><span class="p">(</span><span class="n">sac_split</span><span class="p">)</span> <span class="n">sac_test</span> <span class="o">&lt;-</span> <span class="nf">testing</span><span class="p">(</span><span class="n">sac_split</span><span class="p">)</span> <span class="nf">tibble</span><span class="p">(</span><span class="n">sac_train</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">zip_hash</span> <span class="o">=</span> <span class="nf">map_chr</span><span class="p">(</span><span class="n">zip</span><span class="p">,</span> <span class="n">rlang</span><span class="o">::</span><span class="n">hash</span><span class="p">))</span> <span class="o">%&gt;%</span> <span class="nf">select</span><span class="p">(</span><span class="n">zip</span><span class="p">,</span> <span class="n">zip_hash</span><span class="p">)</span> <span class="c1">#&gt; # A tibble: 698 × 2</span> <span class="c1">#&gt; zip zip_hash </span> <span class="c1">#&gt; &lt;fct&gt; &lt;chr&gt; </span> <span class="c1">#&gt; 1 z95838 32cbb7d319c97f062be64075c2ae6c07</span> <span class="c1">#&gt; 2 z95815 55d08d816f0d2e9ec16af15239826e91</span> <span class="c1">#&gt; 3 z95824 235b72b9a37a6154552498eb3f90e9e3</span> <span class="c1">#&gt; 4 z95841 d973597ab5cc48a0dfe54b84a91249e1</span> <span class="c1">#&gt; 5 z95842 c44537f2eecd51707b19e69027228a85</span> <span class="c1">#&gt; 6 z95820 e1b86cbed49c029f9fa25bba94ede11e</span> <span class="c1">#&gt; 7 z95670 60ee71387789bb8c58748e4632089cc4</span> <span class="c1">#&gt; 8 z95838 32cbb7d319c97f062be64075c2ae6c07</span> <span class="c1">#&gt; 9 z95815 55d08d816f0d2e9ec16af15239826e91</span> <span class="c1">#&gt; 10 z95822 8e212bdf9650ef39a1634e6e18529834</span> <span class="c1">#&gt; # … with 688 more rows</span> </code></pre></div><p>The variable <code>zip</code> in this data on home sales in Sacramento, CA is of <a href="https://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29" target="_blank" rel="noopener">&ldquo;high cardinality&rdquo;</a> (as ZIP codes often are) with 67 unique values. When we <code>hash()</code> the ZIP code, we get out, well, a hash value, and we will always get the same hash value for the same input (as you can see for ZIP code 95838 here). We can choose the fixed size of our hashed output to reduce the number of possible values to whatever we want; it turns out this works well in a lot of situations.</p> <p>Let&rsquo;s use a hashing algorithm like this one (with an output size of 16) to create binary indicator variables for this high cardinality <code>zip</code>:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">hash_rec</span> <span class="o">&lt;-</span> <span class="nf">recipe</span><span class="p">(</span><span class="n">price</span> <span class="o">~</span> <span class="n">zip</span> <span class="o">+</span> <span class="n">beds</span> <span class="o">+</span> <span class="n">baths</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">sac_train</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_dummy_hash</span><span class="p">(</span><span class="n">zip</span><span class="p">,</span> <span class="n">signed</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">num_terms</span> <span class="o">=</span> <span class="m">16L</span><span class="p">)</span> <span class="nf">prep</span><span class="p">(</span><span class="n">hash_rec</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">bake</span><span class="p">(</span><span class="n">new_data</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> <span class="c1">#&gt; # A tibble: 698 × 19</span> <span class="c1">#&gt; dummyhash_zip_01 dummyhash_zip_02 dummyhash_zip_03 dummyhash_zip_04</span> <span class="c1">#&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span> <span class="c1">#&gt; 1 0 0 0 0</span> <span class="c1">#&gt; 2 0 1 0 0</span> <span class="c1">#&gt; 3 0 0 1 0</span> <span class="c1">#&gt; 4 1 0 0 0</span> <span class="c1">#&gt; 5 0 0 0 0</span> <span class="c1">#&gt; 6 0 0 0 0</span> <span class="c1">#&gt; 7 0 1 0 0</span> <span class="c1">#&gt; 8 0 0 0 0</span> <span class="c1">#&gt; 9 0 1 0 0</span> <span class="c1">#&gt; 10 0 0 0 0</span> <span class="c1">#&gt; # … with 688 more rows, and 15 more variables:</span> <span class="c1">#&gt; # dummyhash_zip_05 &lt;dbl&gt;, dummyhash_zip_06 &lt;dbl&gt;,</span> <span class="c1">#&gt; # dummyhash_zip_07 &lt;dbl&gt;, dummyhash_zip_08 &lt;dbl&gt;,</span> <span class="c1">#&gt; # dummyhash_zip_09 &lt;dbl&gt;, dummyhash_zip_10 &lt;dbl&gt;,</span> <span class="c1">#&gt; # dummyhash_zip_11 &lt;dbl&gt;, dummyhash_zip_12 &lt;dbl&gt;,</span> <span class="c1">#&gt; # dummyhash_zip_13 &lt;dbl&gt;, dummyhash_zip_14 &lt;dbl&gt;,</span> <span class="c1">#&gt; # dummyhash_zip_15 &lt;dbl&gt;, dummyhash_zip_16 &lt;dbl&gt;, beds &lt;int&gt;, …</span> </code></pre></div><p>We now have 16 columns for <code>zip</code> (along with the other predictors and the outcome), instead of the over 60 we would have had by making regular dummy variables.</p> <p>For more on feature hashing including its benefits (fast and low memory!) and downsides (not directly interpretable!), check out <a href="https://smltar.com/mlregression.html#case-study-feature-hashing" target="_blank" rel="noopener">Section 6.7 of <em>Supervised Machine Learning for Text Analysis with R</em></a> and/or <a href="https://www.tmwr.org/categorical.html#feature-hashing" target="_blank" rel="noopener">Section 17.4 of <em>Tidy Modeling with R</em></a>.</p> <h2 id="more-customization-for-workflow-sets">More customization for workflow sets <a href="#more-customization-for-workflow-sets"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Last year about this time, we introduced <a href="https://www.tidyverse.org/blog/2021/03/workflowsets-0-0-1/" target="_blank" rel="noopener">workflowsets</a>, a new package for creating, handling, and tuning multiple workflows at once. See <a href="https://www.tmwr.org/workflows.html#workflow-sets-intro" target="_blank" rel="noopener">Section 7.5</a> and especially <a href="https://www.tmwr.org/workflow-sets.html" target="_blank" rel="noopener">Chapter 15</a> of <em>Tidy Modeling with R</em> for more on workflow sets. In the latest release of <a href="https://workflowsets.tidymodels.org/" target="_blank" rel="noopener">workflowsets</a>, we provide finer control of customization for the workflows you create with workflowsets. First you can create a standard workflow set by crossing a set of models with a set of preprocessors (let&rsquo;s just use the feature hashing recipe we already created):</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">glmnet_spec</span> <span class="o">&lt;-</span> <span class="nf">linear_reg</span><span class="p">(</span><span class="n">penalty</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">(),</span> <span class="n">mixture</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;glmnet&#34;</span><span class="p">)</span> <span class="n">mars_spec</span> <span class="o">&lt;-</span> <span class="nf">mars</span><span class="p">(</span><span class="n">prod_degree</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;earth&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">set_mode</span><span class="p">(</span><span class="s">&#34;regression&#34;</span><span class="p">)</span> <span class="n">old_set</span> <span class="o">&lt;-</span> <span class="nf">workflow_set</span><span class="p">(</span> <span class="n">preproc</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">hash</span> <span class="o">=</span> <span class="n">hash_rec</span><span class="p">),</span> <span class="n">models</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">MARS</span> <span class="o">=</span> <span class="n">mars_spec</span><span class="p">,</span> <span class="n">glmnet</span> <span class="o">=</span> <span class="n">glmnet_spec</span><span class="p">)</span> <span class="p">)</span> <span class="n">old_set</span> <span class="c1">#&gt; # A workflow set/tibble: 2 × 4</span> <span class="c1">#&gt; wflow_id info option result </span> <span class="c1">#&gt; &lt;chr&gt; &lt;list&gt; &lt;list&gt; &lt;list&gt; </span> <span class="c1">#&gt; 1 hash_MARS &lt;tibble [1 × 4]&gt; &lt;opts[0]&gt; &lt;list [0]&gt;</span> <span class="c1">#&gt; 2 hash_glmnet &lt;tibble [1 × 4]&gt; &lt;opts[0]&gt; &lt;list [0]&gt;</span> </code></pre></div><p>The <code>option</code> column is a placeholder for any arguments to use when we <em>evaluate</em> the workflow; the possibilities here are any argument to functions like <a href="https://tune.tidymodels.org/reference/tune_grid.html" target="_blank" rel="noopener"><code>tune_grid()</code></a> or <a href="https://tune.tidymodels.org/reference/fit_resamples.html" target="_blank" rel="noopener"><code>fit_resamples()</code></a>. But what about arguments that belong not to the workflow as a whole, but to a recipe or a parsnip model? In the new release, we added support for customizing those kinds of arguments via <code>update_workflow_model()</code> and <code>update_workflow_recipe()</code>. This lets you, for example, say that you want to use a <a href="https://www.tidyverse.org/blog/2020/11/tidymodels-sparse-support/" target="_blank" rel="noopener">sparse blueprint</a> for fitting:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">sparse_bp</span> <span class="o">&lt;-</span> <span class="n">hardhat</span><span class="o">::</span><span class="nf">default_recipe_blueprint</span><span class="p">(</span><span class="n">composition</span> <span class="o">=</span> <span class="s">&#34;dgCMatrix&#34;</span><span class="p">)</span> <span class="n">new_set</span> <span class="o">&lt;-</span> <span class="n">old_set</span> <span class="o">%&gt;%</span> <span class="nf">update_workflow_recipe</span><span class="p">(</span><span class="s">&#34;hash_glmnet&#34;</span><span class="p">,</span> <span class="n">hash_rec</span><span class="p">,</span> <span class="n">blueprint</span> <span class="o">=</span> <span class="n">sparse_bp</span><span class="p">)</span> </code></pre></div><p>Now we can tune this workflow set, with the sparse blueprint for the glmnet model, over a set of resampling folds.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span> <span class="n">folds</span> <span class="o">&lt;-</span> <span class="nf">vfold_cv</span><span class="p">(</span><span class="n">sac_train</span><span class="p">,</span> <span class="n">strata</span> <span class="o">=</span> <span class="n">price</span><span class="p">)</span> <span class="n">new_set</span> <span class="o">%&gt;%</span> <span class="nf">workflow_map</span><span class="p">(</span><span class="n">resamples</span> <span class="o">=</span> <span class="n">folds</span><span class="p">,</span> <span class="n">grid</span> <span class="o">=</span> <span class="m">5</span><span class="p">,</span> <span class="n">verbose</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="c1">#&gt; i 1 of 2 tuning: hash_MARS</span> <span class="c1">#&gt; ✓ 1 of 2 tuning: hash_MARS (2.2s)</span> <span class="c1">#&gt; i 2 of 2 tuning: hash_glmnet</span> <span class="c1">#&gt; ✓ 2 of 2 tuning: hash_glmnet (3.9s)</span> <span class="c1">#&gt; # A workflow set/tibble: 2 × 4</span> <span class="c1">#&gt; wflow_id info option result </span> <span class="c1">#&gt; &lt;chr&gt; &lt;list&gt; &lt;list&gt; &lt;list&gt; </span> <span class="c1">#&gt; 1 hash_MARS &lt;tibble [1 × 4]&gt; &lt;opts[2]&gt; &lt;tune[+]&gt;</span> <span class="c1">#&gt; 2 hash_glmnet &lt;tibble [1 × 4]&gt; &lt;opts[2]&gt; &lt;tune[+]&gt;</span> </code></pre></div> <h2 id="new-parameter-objects-and-parameter-handling">New parameter objects and parameter handling <a href="#new-parameter-objects-and-parameter-handling"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Even if you are a regular tidymodels user, you may not have thought much about <a href="https://dials.tidymodels.org/" target="_blank" rel="noopener">dials</a>. This is an infrastructure package that is used to create and manage model hyperparameters. In the latest release of dials, we provide a handful of new parameters for various models and feature engineering approaches. There are a handful of parameters <a href="https://parsnip.tidymodels.org/reference/bart.html" target="_blank" rel="noopener">for the new <code>parsnip::bart()</code></a>, i.e. Bayesian additive regression trees model:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">prior_outcome_range</span><span class="p">()</span> <span class="c1">#&gt; Prior for Outcome Range (quantitative)</span> <span class="c1">#&gt; Range: (0, 5]</span> <span class="nf">prior_terminal_node_coef</span><span class="p">()</span> <span class="c1">#&gt; Terminal Node Prior Coefficient (quantitative)</span> <span class="c1">#&gt; Range: (0, 1]</span> <span class="nf">prior_terminal_node_expo</span><span class="p">()</span> <span class="c1">#&gt; Terminal Node Prior Exponent (quantitative)</span> <span class="c1">#&gt; Range: [0, 3]</span> </code></pre></div><p>This version of dials, along with the new hardhat release, also provides new functions for extracting single parameters and parameter sets from modeling objects.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">recipe</span><span class="p">(</span><span class="n">price</span> <span class="o">~</span> <span class="n">zip</span> <span class="o">+</span> <span class="n">beds</span> <span class="o">+</span> <span class="n">baths</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">sac_train</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_dummy_hash</span><span class="p">(</span><span class="n">zip</span><span class="p">,</span> <span class="n">signed</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span> <span class="n">num_terms</span> <span class="o">=</span> <span class="nf">tune</span><span class="p">())</span> <span class="o">%&gt;%</span> <span class="nf">extract_parameter_set_dials</span><span class="p">()</span> <span class="c1">#&gt; Collection of 1 parameters for tuning</span> <span class="c1">#&gt; </span> <span class="c1">#&gt; identifier type object</span> <span class="c1">#&gt; num_terms num_terms nparam[+]</span> </code></pre></div><p>You can also extract a single parameter by name:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">mars_spec</span> <span class="o">%&gt;%</span> <span class="nf">extract_parameter_dials</span><span class="p">(</span><span class="s">&#34;prod_degree&#34;</span><span class="p">)</span> <span class="c1">#&gt; Degree of Interaction (quantitative)</span> <span class="c1">#&gt; Range: [1, 2]</span> <span class="n">glmnet_spec</span> <span class="o">%&gt;%</span> <span class="nf">extract_parameter_dials</span><span class="p">(</span><span class="s">&#34;penalty&#34;</span><span class="p">)</span> <span class="c1">#&gt; Amount of Regularization (quantitative)</span> <span class="c1">#&gt; Transformer: log-10 </span> <span class="c1">#&gt; Range (transformed scale): [-10, 0]</span> </code></pre></div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We’d like to extend our thanks to all of the contributors who helped make these releases during Q1 possible!</p> <ul> <li> <p>baguette: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a> and <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>.</p> </li> <li> <p>broom: <a href="https://github.com/cgoo4" target="_blank" rel="noopener">@cgoo4</a>, <a href="https://github.com/colinbrislawn" target="_blank" rel="noopener">@colinbrislawn</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/fschaffner" target="_blank" rel="noopener">@fschaffner</a>, <a href="https://github.com/grantmcdermott" target="_blank" rel="noopener">@grantmcdermott</a>, <a href="https://github.com/hughjonesd" target="_blank" rel="noopener">@hughjonesd</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/Marc-Girondot" target="_blank" rel="noopener">@Marc-Girondot</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mlaviolet" target="_blank" rel="noopener">@mlaviolet</a>, <a href="https://github.com/oliverbothe" target="_blank" rel="noopener">@oliverbothe</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, and <a href="https://github.com/vincentarelbundock" target="_blank" rel="noopener">@vincentarelbundock</a>.</p> </li> <li> <p>brulee: <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>dials: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>.</p> </li> <li> <p>discrim: <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmarshallnz" target="_blank" rel="noopener">@jmarshallnz</a>, and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</p> </li> <li> <p>finetune: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/Steviey" target="_blank" rel="noopener">@Steviey</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>hardhat: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/MasterLuke84" target="_blank" rel="noopener">@MasterLuke84</a>.</p> </li> <li> <p>multilevelmod: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a> and <a href="https://github.com/sitendug" target="_blank" rel="noopener">@sitendug</a>.</p> </li> <li> <p>parsnip: <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/dietrichson" target="_blank" rel="noopener">@dietrichson</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jmarshallnz" target="_blank" rel="noopener">@jmarshallnz</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/nikhilpathiyil" target="_blank" rel="noopener">@nikhilpathiyil</a>, <a href="https://github.com/nvelden" target="_blank" rel="noopener">@nvelden</a>, <a href="https://github.com/t-kalinowski" target="_blank" rel="noopener">@t-kalinowski</a>, <a href="https://github.com/tiagomaie" target="_blank" rel="noopener">@tiagomaie</a>, <a href="https://github.com/tolliam" target="_blank" rel="noopener">@tolliam</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>plsmod: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a> and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>poissonreg: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a> and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</p> </li> <li> <p>recipes: <a href="https://github.com/agwalker82" target="_blank" rel="noopener">@agwalker82</a>, <a href="https://github.com/AndrewKostandy" target="_blank" rel="noopener">@AndrewKostandy</a>, <a href="https://github.com/aridf" target="_blank" rel="noopener">@aridf</a>, <a href="https://github.com/brunocarlin" target="_blank" rel="noopener">@brunocarlin</a>, <a href="https://github.com/DoktorMike" target="_blank" rel="noopener">@DoktorMike</a>, <a href="https://github.com/duccioa" target="_blank" rel="noopener">@duccioa</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/FieteO" target="_blank" rel="noopener">@FieteO</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mdsteiner" target="_blank" rel="noopener">@mdsteiner</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/spsanderson" target="_blank" rel="noopener">@spsanderson</a>, <a href="https://github.com/themichjam" target="_blank" rel="noopener">@themichjam</a>, <a href="https://github.com/tmastny" target="_blank" rel="noopener">@tmastny</a>, <a href="https://github.com/tomazweiss" target="_blank" rel="noopener">@tomazweiss</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>, and <a href="https://github.com/zenggyu" target="_blank" rel="noopener">@zenggyu</a>.</p> </li> <li> <p>rules: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/wdkeyzer" target="_blank" rel="noopener">@wdkeyzer</a>.</p> </li> <li> <p>stacks: <a href="https://github.com/amcmahon17" target="_blank" rel="noopener">@amcmahon17</a>, <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>, <a href="https://github.com/Saarialho" target="_blank" rel="noopener">@Saarialho</a>, <a href="https://github.com/siegfried" target="_blank" rel="noopener">@siegfried</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/StuieT85" target="_blank" rel="noopener">@StuieT85</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/williamshell" target="_blank" rel="noopener">@williamshell</a>.</p> </li> <li> <p>textrecipes: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, and <a href="https://github.com/NLDataScientist" target="_blank" rel="noopener">@NLDataScientist</a>.</p> </li> <li> <p>tune: <a href="https://github.com/abichat" target="_blank" rel="noopener">@abichat</a>, <a href="https://github.com/AndrewKostandy" target="_blank" rel="noopener">@AndrewKostandy</a>, <a href="https://github.com/dax44" target="_blank" rel="noopener">@dax44</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/felxcon" target="_blank" rel="noopener">@felxcon</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juanydlh" target="_blank" rel="noopener">@juanydlh</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/mdancho84" target="_blank" rel="noopener">@mdancho84</a>, <a href="https://github.com/py9mrg" target="_blank" rel="noopener">@py9mrg</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>, <a href="https://github.com/williamshell" target="_blank" rel="noopener">@williamshell</a>, and <a href="https://github.com/wtbxsjy" target="_blank" rel="noopener">@wtbxsjy</a>.</p> </li> <li> <p>tidymodels: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/exsell-jc" target="_blank" rel="noopener">@exsell-jc</a>, <a href="https://github.com/hardin47" target="_blank" rel="noopener">@hardin47</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/PursuitOfDataScience" target="_blank" rel="noopener">@PursuitOfDataScience</a>, <a href="https://github.com/RaymondBalise" target="_blank" rel="noopener">@RaymondBalise</a>, <a href="https://github.com/scottlyden" target="_blank" rel="noopener">@scottlyden</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>usemodels: <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a> and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> </li> <li> <p>vetiver: <a href="https://github.com/atheriel" target="_blank" rel="noopener">@atheriel</a> and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</p> </li> <li> <p>workflows: <a href="https://github.com/CarstenLange" target="_blank" rel="noopener">@CarstenLange</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dpprdan" target="_blank" rel="noopener">@dpprdan</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>.</p> </li> <li> <p>workflowsets: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dvanic" target="_blank" rel="noopener">@dvanic</a>, <a href="https://github.com/gdmcdonald" target="_blank" rel="noopener">@gdmcdonald</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, and <a href="https://github.com/wdefreitas" target="_blank" rel="noopener">@wdefreitas</a>.</p> </li> </ul> readxl 1.4.0 https://www.tidyverse.org/blog/2022/03/readxl-1-4-0/ Mon, 28 Mar 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/03/readxl-1-4-0/ <p>We&rsquo;re pleased to announce the release of <a href="https://readxl.tidyverse.org" target="_blank" rel="noopener">readxl</a> 1.4.0. The readxl package makes it easy to get tabular data out of Excel files and into R with code, not mouse clicks. It supports both the legacy <code>.xls</code> format and the modern XML-based <code>.xlsx</code> format. readxl is designed to be easy to install (so: no external dependencies) and to cope with many of the less savory features of Excel files created by humans and 3rd party applications.</p> <p>The easiest way to install the latest version from CRAN is to install the whole tidyverse.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidyverse"</span><span class='o'>)</span></code></pre> </div> <p>Alternatively, install just readxl from CRAN:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"readxl"</span><span class='o'>)</span></code></pre> </div> <p>Regardless, you will still need to attach readxl explicitly. It is not a core tidyverse package, i.e. readxl is NOT attached via <a href="https://tidyverse.tidyverse.org" target="_blank" rel="noopener"><code>library(tidyverse)</code></a>. Instead, do this in your script:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://readxl.tidyverse.org'>readxl</a></span><span class='o'>)</span></code></pre> </div> <p>This release has practically no changes that should be noticeable by the typical user. However, internally, there have been extensive updates that set the stage for future user-facing improvements. Therefore, this post will be quite short and the main point is to encourage readxl users to kick the tires. We set out to upgrade the foundation to support building new features and we&rsquo;d love to hear about any unintended regressions.</p> <p>You can see a full list of changes in the <a href="https://readxl.tidyverse.org/news/index.html" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="updated-libxls">Updated libxls <a href="#updated-libxls"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>readxl now embeds libxls v1.6.2 (the previous release embedded v1.5.0). The libxls project is maintained by Evan Miller and is hosted at <a href="https://github.com/libxls/libxls">https://github.com/libxls/libxls</a>, where you can read more in its <a href="https://github.com/libxls/libxls/releases" target="_blank" rel="noopener">release notes</a>. These accumulated releases fix a number of edge cases, allowing readxl to read even more weird and wonderful <code>.xls</code> files.</p> <h2 id="switch-from-rcpp-to-cpp11">Switch from Rcpp to cpp11 <a href="#switch-from-rcpp-to-cpp11"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to Shelby Bearrows, readxl now uses <a href="https://cpp11.r-lib.org" target="_blank" rel="noopener">cpp11</a>. Shelby is a new member of the tidyverse team and she <a href="https://www.tidyverse.org/blog/2021/09/updating-to-cpp11/" target="_blank" rel="noopener">blogged about this project</a> during her 2021 summer internship.</p> <h2 id="other-small-improvements-and-whats-next">Other small improvements and what&rsquo;s next <a href="#other-small-improvements-and-whats-next"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>&ldquo;Date or Not Date&rdquo;: readxl&rsquo;s understanding of number formats has gotten more sophisticated (thanks <a href="https://github.com/nacnudus" target="_blank" rel="noopener">@nacnudus</a> and <a href="https://github.com/reviewher" target="_blank" rel="noopener">@reviewher</a>!). Non-datetime formats that incorporate colours or currencies should no longer be confused with datetime formats. We anticipate this will result in more accurate guessing of cell and column types.</p> <p>What&rsquo;s coming next? I won&rsquo;t go so far as to promise that 2022 is the year of readxl 😉. But I can say that top priorities include equipping readxl with better problem reporting and column specification, making its interface feel more similar to that of <a href="https://readr.tidyverse.org" target="_blank" rel="noopener">readr</a> and <a href="https://vroom.r-lib.org" target="_blank" rel="noopener">vroom</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to the 103 people who have contributed to readxl since we last blogged about it (upon the release of version 1.2.0 in December 2018) by reporting bugs and suggesting new features: <a href="https://github.com/abcdef123ghi" target="_blank" rel="noopener">@abcdef123ghi</a>, <a href="https://github.com/acvelozo" target="_blank" rel="noopener">@acvelozo</a>, <a href="https://github.com/ahbon123" target="_blank" rel="noopener">@ahbon123</a>, <a href="https://github.com/ajit555" target="_blank" rel="noopener">@ajit555</a>, <a href="https://github.com/artinmg" target="_blank" rel="noopener">@artinmg</a>, <a href="https://github.com/aswansyahputra" target="_blank" rel="noopener">@aswansyahputra</a>, <a href="https://github.com/averiperny" target="_blank" rel="noopener">@averiperny</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/ben1787" target="_blank" rel="noopener">@ben1787</a>, <a href="https://github.com/benmatthewsed" target="_blank" rel="noopener">@benmatthewsed</a>, <a href="https://github.com/benwatsoncpa" target="_blank" rel="noopener">@benwatsoncpa</a>, <a href="https://github.com/benzipperer" target="_blank" rel="noopener">@benzipperer</a>, <a href="https://github.com/bhive01" target="_blank" rel="noopener">@bhive01</a>, <a href="https://github.com/bjorn81" target="_blank" rel="noopener">@bjorn81</a>, <a href="https://github.com/boshek" target="_blank" rel="noopener">@boshek</a>, <a href="https://github.com/brkbrc" target="_blank" rel="noopener">@brkbrc</a>, <a href="https://github.com/Brunox13" target="_blank" rel="noopener">@Brunox13</a>, <a href="https://github.com/cderv" target="_blank" rel="noopener">@cderv</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/ddekadt" target="_blank" rel="noopener">@ddekadt</a>, <a href="https://github.com/dkgaraujo" target="_blank" rel="noopener">@dkgaraujo</a>, <a href="https://github.com/donnekgit" target="_blank" rel="noopener">@donnekgit</a>, <a href="https://github.com/druedin" target="_blank" rel="noopener">@druedin</a>, <a href="https://github.com/dxbhans" target="_blank" rel="noopener">@dxbhans</a>, <a href="https://github.com/elephann" target="_blank" rel="noopener">@elephann</a>, <a href="https://github.com/eringrand" target="_blank" rel="noopener">@eringrand</a>, <a href="https://github.com/estern95" target="_blank" rel="noopener">@estern95</a>, <a href="https://github.com/fary90" target="_blank" rel="noopener">@fary90</a>, <a href="https://github.com/fermumen" target="_blank" rel="noopener">@fermumen</a>, <a href="https://github.com/fndemarqui" target="_blank" rel="noopener">@fndemarqui</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gbganalyst" target="_blank" rel="noopener">@gbganalyst</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/hadley" target="_blank" rel="noopener">@hadley</a>, <a href="https://github.com/hammao" target="_blank" rel="noopener">@hammao</a>, <a href="https://github.com/hannes101" target="_blank" rel="noopener">@hannes101</a>, <a href="https://github.com/hddao" target="_blank" rel="noopener">@hddao</a>, <a href="https://github.com/hidekoji" target="_blank" rel="noopener">@hidekoji</a>, <a href="https://github.com/HughParsonage" target="_blank" rel="noopener">@HughParsonage</a>, <a href="https://github.com/idontgetoutmuch" target="_blank" rel="noopener">@idontgetoutmuch</a>, <a href="https://github.com/j-sirgo" target="_blank" rel="noopener">@j-sirgo</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jeromyanglim" target="_blank" rel="noopener">@jeromyanglim</a>, <a href="https://github.com/jimhester" target="_blank" rel="noopener">@jimhester</a>, <a href="https://github.com/jmcurran" target="_blank" rel="noopener">@jmcurran</a>, <a href="https://github.com/josh-m-sharpe" target="_blank" rel="noopener">@josh-m-sharpe</a>, <a href="https://github.com/jwhendy" target="_blank" rel="noopener">@jwhendy</a>, <a href="https://github.com/jzadra" target="_blank" rel="noopener">@jzadra</a>, <a href="https://github.com/kfhk" target="_blank" rel="noopener">@kfhk</a>, <a href="https://github.com/kiernann" target="_blank" rel="noopener">@kiernann</a>, <a href="https://github.com/ksetdekov" target="_blank" rel="noopener">@ksetdekov</a>, <a href="https://github.com/kwebihaf-github" target="_blank" rel="noopener">@kwebihaf-github</a>, <a href="https://github.com/llrs" target="_blank" rel="noopener">@llrs</a>, <a href="https://github.com/loureynolds" target="_blank" rel="noopener">@loureynolds</a>, <a href="https://github.com/lucasmation" target="_blank" rel="noopener">@lucasmation</a>, <a href="https://github.com/lucifersFall1n1" target="_blank" rel="noopener">@lucifersFall1n1</a>, <a href="https://github.com/luisvalenzuelar" target="_blank" rel="noopener">@luisvalenzuelar</a>, <a href="https://github.com/matthiasgomolka" target="_blank" rel="noopener">@matthiasgomolka</a>, <a href="https://github.com/MeoWoo6" target="_blank" rel="noopener">@MeoWoo6</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/mine-cetinkaya-rundel" target="_blank" rel="noopener">@mine-cetinkaya-rundel</a>, <a href="https://github.com/misea" target="_blank" rel="noopener">@misea</a>, <a href="https://github.com/mkoohafkan" target="_blank" rel="noopener">@mkoohafkan</a>, <a href="https://github.com/moodymudskipper" target="_blank" rel="noopener">@moodymudskipper</a>, <a href="https://github.com/msgoussi" target="_blank" rel="noopener">@msgoussi</a>, <a href="https://github.com/nacnudus" target="_blank" rel="noopener">@nacnudus</a>, <a href="https://github.com/narayanana" target="_blank" rel="noopener">@narayanana</a>, <a href="https://github.com/nfultz" target="_blank" rel="noopener">@nfultz</a>, <a href="https://github.com/nickschurch" target="_blank" rel="noopener">@nickschurch</a>, <a href="https://github.com/nlneas1" target="_blank" rel="noopener">@nlneas1</a>, <a href="https://github.com/nqkhanh2209" target="_blank" rel="noopener">@nqkhanh2209</a>, <a href="https://github.com/ntsigilis" target="_blank" rel="noopener">@ntsigilis</a>, <a href="https://github.com/pitakakariki" target="_blank" rel="noopener">@pitakakariki</a>, <a href="https://github.com/pmallot" target="_blank" rel="noopener">@pmallot</a>, <a href="https://github.com/qdread" target="_blank" rel="noopener">@qdread</a>, <a href="https://github.com/queleanalytics" target="_blank" rel="noopener">@queleanalytics</a>, <a href="https://github.com/ramay" target="_blank" rel="noopener">@ramay</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/Rindrics" target="_blank" rel="noopener">@Rindrics</a>, <a href="https://github.com/rsbivand" target="_blank" rel="noopener">@rsbivand</a>, <a href="https://github.com/rwbaer" target="_blank" rel="noopener">@rwbaer</a>, <a href="https://github.com/saanasum" target="_blank" rel="noopener">@saanasum</a>, <a href="https://github.com/sbearrows" target="_blank" rel="noopener">@sbearrows</a>, <a href="https://github.com/Sbirch556" target="_blank" rel="noopener">@Sbirch556</a>, <a href="https://github.com/seanchrismurphy" target="_blank" rel="noopener">@seanchrismurphy</a>, <a href="https://github.com/Shicheng-Guo" target="_blank" rel="noopener">@Shicheng-Guo</a>, <a href="https://github.com/Sibojang9" target="_blank" rel="noopener">@Sibojang9</a>, <a href="https://github.com/simowaves" target="_blank" rel="noopener">@simowaves</a>, <a href="https://github.com/smsaladi" target="_blank" rel="noopener">@smsaladi</a>, <a href="https://github.com/songc-93" target="_blank" rel="noopener">@songc-93</a>, <a href="https://github.com/SteveDeitz" target="_blank" rel="noopener">@SteveDeitz</a>, <a href="https://github.com/struckma" target="_blank" rel="noopener">@struckma</a>, <a href="https://github.com/sureshvigneshbe" target="_blank" rel="noopener">@sureshvigneshbe</a>, <a href="https://github.com/tfulge" target="_blank" rel="noopener">@tfulge</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/ucb" target="_blank" rel="noopener">@ucb</a>, <a href="https://github.com/vchouraki" target="_blank" rel="noopener">@vchouraki</a>, <a href="https://github.com/wanttobenatural" target="_blank" rel="noopener">@wanttobenatural</a>, <a href="https://github.com/wgrundlingh" target="_blank" rel="noopener">@wgrundlingh</a>, <a href="https://github.com/WilDoane" target="_blank" rel="noopener">@WilDoane</a>, <a href="https://github.com/zerogetsamgow" target="_blank" rel="noopener">@zerogetsamgow</a>, <a href="https://github.com/zhangbs92" target="_blank" rel="noopener">@zhangbs92</a>, and <a href="https://github.com/zx8754" target="_blank" rel="noopener">@zx8754</a>.</p> Updates for parsnip packages https://www.tidyverse.org/blog/2022/03/parsnip-roundup-2022/ Thu, 24 Mar 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/03/parsnip-roundup-2022/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] `hugodown::use_tidy_thumbnails()` * [x] Add intro sentence, e.g. the standard tagline for the package * [x] `usethis::use_tidy_thanks()` --> <p>We&rsquo;re delighted to announce the release of <a href="https://parsnip.tidymodels.org/" target="_blank" rel="noopener">parsnip</a> 0.2.1. parsnip is a unified modeling interface for tidymodels.</p> <p>This release of parsnip precipitated releases of our parsnip extension packages: baguette, discrim, plsmod, poissonreg, and rules. It also allowed us to release an additional package called multilevelmod (see the section below). We&rsquo;ve kept CRAN busy!</p> <p>You can see a full list of recent parsnip changes in the <a href="https://parsnip.tidymodels.org/news/index.html" target="_blank" rel="noopener">release notes</a>. You can install the entire set from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;parsnip&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;baguette&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;discrim&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;multilevelmod&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;plsmod&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;poissonreg&#34;</span><span class="p">)</span> <span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;rules&#34;</span><span class="p">)</span> </code></pre></div><p>Let&rsquo;s look at a summary of the changes, which are almost entirely in parsnip, before looking at multilevelmod.</p> <h2 id="major-changes-to-parsnip">Major changes to parsnip <a href="#major-changes-to-parsnip"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>There are a lot of improvements in this version of parsnip. The main changes are described below.</p> <h3 id="more-documentation-improvements">More documentation improvements <a href="#more-documentation-improvements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>A <a href="https://www.tidyverse.org/blog/2021/07/tidymodels-july-2021/#better-model-documentation" target="_blank" rel="noopener">previous version of parsnip</a> added a nice feature where the help package for each model showed the engines that are available. One confusing aspect of this was that the list depended on what packages that were loaded. It also didn&rsquo;t tell users what engines are <em>possible</em>.</p> <p>Now, parsnip shows all of the known engines and labels which require extension packages. Here&rsquo;s a screenshot of what you get with <code>?linear_reg</code>:</p> <p><img src="engines.png" title="plot of chunk engines" alt="plot of chunk engines" width="80%" style="display: block; margin: auto;" /></p> <p>This will not change within a version of parsnip; we&rsquo;ll update each list with each release.</p> <h3 id="bart">BART <a href="#bart"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve added a model function for the excellent Bayesian Additive Regression Trees (BART) approach and an engine for the <a href="https://github.com/vdorie/dbarts" target="_blank" rel="noopener">dbarts</a> package. The model is an ensemble of trees that is assembled using Bayesian estimation methods. It typically has very good predictive performance and is also able to generate estimates of the predictive posterior variance, and prediction intervals.</p> <p>A good overview of this model is: <em>Bayesian Additive Regression Trees: A Review and Look Forward</em> ( <a href="https://par.nsf.gov/servlets/purl/10181031" target="_blank" rel="noopener">pdf</a>).</p> <h3 id="new-engines">New engines <a href="#new-engines"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Within parsnip, a <code>&quot;glm&quot;</code> engine was added for linear regression. An engine vale of <code>&quot;brulee&quot;</code> was added for linear, logistic, and multinomial regression as well as for neural networks. The brulee package is a new, and is for fitting models using torch (look for a blog post soon on this package).</p> <p>As discussed below, the multilevelmod package adds a lot more engines for linear(ish) models, such as <a href="https://parsnip.tidymodels.org/reference/details_linear_reg_gee.html" target="_blank" rel="noopener"><code>&quot;gee&quot;</code></a>, <a href="https://parsnip.tidymodels.org/reference/details_linear_reg_gls.html" target="_blank" rel="noopener"><code>&quot;gls&quot;</code></a>, <a href="https://parsnip.tidymodels.org/reference/details_linear_reg_lme.html" target="_blank" rel="noopener"><code>&quot;lme&quot;</code></a>, <a href="https://parsnip.tidymodels.org/reference/details_linear_reg_lmer.html" target="_blank" rel="noopener"><code>&quot;lmer&quot;</code></a>, and <a href="https://parsnip.tidymodels.org/reference/details_linear_reg_stan_glmer.html" target="_blank" rel="noopener"><code>&quot;stan_glmer&quot;</code></a>. There are similar engines for logistic and Poisson regression.</p> <h2 id="multilevelmod">multilevelmod <a href="#multilevelmod"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This package has been simmering for a while on GitHub. Its engines are useful for fitting a variety of models that go by a litany of different names: mixed effects models, random coefficient models, variance component models, hierarchical linear models, and so on.</p> <p>One aspect of these models is that they mostly work with the formula method, which specifies both the model terms and also which of these are &ldquo;random effects&rdquo;.</p> <p>As an example, let&rsquo;s look at the measurement system analysis (MSA) data in the package. In these data, 56 separate items were measured twice using a laboratory test. The lab would like to understand how noisy their data are and if different samples can be distinguished from one another. Here&rsquo;s a plot of the data:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">parsnip</span><span class="p">)</span> <span class="nf">library</span><span class="p">(</span><span class="n">multilevelmod</span><span class="p">)</span> <span class="nf">data</span><span class="p">(</span><span class="n">msa_data</span><span class="p">)</span> <span class="n">msa_data</span> <span class="o">%&gt;%</span> <span class="nf">ggplot</span><span class="p">()</span> <span class="o">+</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="nf">reorder</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">value</span><span class="p">),</span> <span class="n">y</span> <span class="o">=</span> <span class="n">value</span><span class="p">,</span> <span class="n">col</span> <span class="o">=</span> <span class="n">replicate</span><span class="p">,</span> <span class="n">pch</span> <span class="o">=</span> <span class="n">replicate</span><span class="p">)</span> <span class="o">+</span> <span class="nf">geom_point</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="m">1</span><span class="o">/</span><span class="m">2</span><span class="p">,</span> <span class="n">cex</span> <span class="o">=</span> <span class="m">3</span><span class="p">)</span> <span class="o">+</span> <span class="nf">labs</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="s">&#34;lab result&#34;</span><span class="p">)</span> <span class="o">+</span> <span class="nf">theme_bw</span><span class="p">()</span> <span class="o">+</span> <span class="nf">theme</span><span class="p">(</span> <span class="n">axis.text.x</span> <span class="o">=</span> <span class="nf">element_text</span><span class="p">(</span><span class="n">angle</span> <span class="o">=</span> <span class="m">90</span><span class="p">),</span> <span class="n">legend.position</span> <span class="o">=</span> <span class="s">&#34;top&#34;</span> <span class="p">)</span> </code></pre></div><p><img src="figure/data-plot-1.svg" title="plot of chunk data-plot" alt="plot of chunk data-plot" style="display: block; margin: auto;" /></p> <p>With this data set, the goal is to estimate how much of the variation in the lab test is due to the different samples (as it should be since they are different) or measurement noise. The latter term could be associated with day-to-day differences, people-to-people differences etc. It might also be irreducible noise. In any case, we&rsquo;d like to get estimates of these two sources of variation.</p> <p>A straightforward way to estimate this is to use a repeated measurements model that considers the samples to be randomly selected from a population that are independent from one another. We can add a random intercept term that is different for each sample. From this, the sample-to-sample variance can be computed.</p> <p>There are a lot of packages that can do this but we&rsquo;ll use the lme4 package:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">msa_model</span> <span class="o">&lt;-</span> <span class="nf">linear_reg</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">set_engine</span><span class="p">(</span><span class="s">&#34;lmer&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="c1"># The formula has (1|id) which means that each sample (=id) should</span> <span class="c1"># have a different intercept (=1)</span> <span class="nf">fit</span><span class="p">(</span><span class="n">value</span> <span class="o">~</span> <span class="p">(</span><span class="m">1</span><span class="o">|</span><span class="n">id</span><span class="p">),</span> <span class="n">data</span> <span class="o">=</span> <span class="n">msa_data</span><span class="p">)</span> <span class="n">msa_model</span> </code></pre></div><pre><code>## parsnip model object ## ## Linear mixed model fit by REML ['lmerMod'] ## Formula: value ~ (1 | id) ## Data: data ## REML criterion at convergence: 163.0314 ## Random effects: ## Groups Name Std.Dev. ## id (Intercept) 0.6397 ## Residual 0.2618 ## Number of obs: 112, groups: id, 56 ## Fixed Effects: ## (Intercept) ## 0.8778 </code></pre><p>We can see from this output that the sample-to-sample variance is <code>0.6397^2 = 0.40921</code> which gives a percental of the total variance of:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="m">0.6397</span> <span class="n">^</span> <span class="m">2</span> <span class="o">/</span> <span class="p">(</span><span class="m">0.6397</span> <span class="n">^</span> <span class="m">2</span> <span class="o">+</span> <span class="m">0.2618</span> <span class="n">^</span> <span class="m">2</span><span class="p">)</span> <span class="o">*</span> <span class="m">100</span> </code></pre></div><pre><code>## [1] 85.6539 </code></pre><p>Pretty good!</p> <p>There is a lot more that can be done with these models in terms of prediction and inference. If you are interested in more about multilevelmod, take a look at the <a href="https://multilevelmod.tidymodels.org/articles/multilevelmod.html" target="_blank" rel="noopener">Get Started</a> vignette.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank all of the contributors to these packages since their last releases: <a href="https://github.com/asshah4" target="_blank" rel="noopener">@asshah4</a>, <a href="https://github.com/batpigandme" target="_blank" rel="noopener">@batpigandme</a>, <a href="https://github.com/bshor" target="_blank" rel="noopener">@bshor</a>, <a href="https://github.com/cimentadaj" target="_blank" rel="noopener">@cimentadaj</a>, <a href="https://github.com/daaronr" target="_blank" rel="noopener">@daaronr</a>, <a href="https://github.com/davestr2" target="_blank" rel="noopener">@davestr2</a>, <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/deschen1" target="_blank" rel="noopener">@deschen1</a>, <a href="https://github.com/dfalbel" target="_blank" rel="noopener">@dfalbel</a>, <a href="https://github.com/dietrichson" target="_blank" rel="noopener">@dietrichson</a>, <a href="https://github.com/edgararuiz" target="_blank" rel="noopener">@edgararuiz</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/fabrice-rossi" target="_blank" rel="noopener">@fabrice-rossi</a>, <a href="https://github.com/frequena" target="_blank" rel="noopener">@frequena</a>, <a href="https://github.com/ghost" target="_blank" rel="noopener">@ghost</a>, <a href="https://github.com/gmcmacran" target="_blank" rel="noopener">@gmcmacran</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/JB304245" target="_blank" rel="noopener">@JB304245</a>, <a href="https://github.com/Jeffrothschild" target="_blank" rel="noopener">@Jeffrothschild</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/jonthegeek" target="_blank" rel="noopener">@jonthegeek</a>, <a href="https://github.com/josefortou" target="_blank" rel="noopener">@josefortou</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/kcarnold" target="_blank" rel="noopener">@kcarnold</a>, <a href="https://github.com/maspotts" target="_blank" rel="noopener">@maspotts</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/meenakshi-kushwaha" target="_blank" rel="noopener">@meenakshi-kushwaha</a>, <a href="https://github.com/miepstei" target="_blank" rel="noopener">@miepstei</a>, <a href="https://github.com/mmp3" target="_blank" rel="noopener">@mmp3</a>, <a href="https://github.com/NickCH-K" target="_blank" rel="noopener">@NickCH-K</a>, <a href="https://github.com/nikhilpathiyil" target="_blank" rel="noopener">@nikhilpathiyil</a>, <a href="https://github.com/nvelden" target="_blank" rel="noopener">@nvelden</a>, <a href="https://github.com/p-lemercier" target="_blank" rel="noopener">@p-lemercier</a>, <a href="https://github.com/psads-git" target="_blank" rel="noopener">@psads-git</a>, <a href="https://github.com/RaymondBalise" target="_blank" rel="noopener">@RaymondBalise</a>, <a href="https://github.com/rmflight" target="_blank" rel="noopener">@rmflight</a>, <a href="https://github.com/saadaslam" target="_blank" rel="noopener">@saadaslam</a>, <a href="https://github.com/Shafi2016" target="_blank" rel="noopener">@Shafi2016</a>, <a href="https://github.com/shuckle16" target="_blank" rel="noopener">@shuckle16</a>, <a href="https://github.com/sitendug" target="_blank" rel="noopener">@sitendug</a>, <a href="https://github.com/ssh352" target="_blank" rel="noopener">@ssh352</a>, <a href="https://github.com/stephenhillphd" target="_blank" rel="noopener">@stephenhillphd</a>, <a href="https://github.com/stevenpawley" target="_blank" rel="noopener">@stevenpawley</a>, <a href="https://github.com/Steviey" target="_blank" rel="noopener">@Steviey</a>, <a href="https://github.com/t-kalinowski" target="_blank" rel="noopener">@t-kalinowski</a>, <a href="https://github.com/t-neumann" target="_blank" rel="noopener">@t-neumann</a>, <a href="https://github.com/tiagomaie" target="_blank" rel="noopener">@tiagomaie</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/tsengj" target="_blank" rel="noopener">@tsengj</a>, <a href="https://github.com/ttrodrigz" target="_blank" rel="noopener">@ttrodrigz</a>, <a href="https://github.com/wdkeyzer" target="_blank" rel="noopener">@wdkeyzer</a>, <a href="https://github.com/yitao-li" target="_blank" rel="noopener">@yitao-li</a>, <a href="https://github.com/zenggyu" target="_blank" rel="noopener">@zenggyu</a></p> usemodels 0.2.0 https://www.tidyverse.org/blog/2022/03/usemodels-0-2-0/ Wed, 23 Mar 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/03/usemodels-0-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] `hugodown::use_tidy_thumbnails()` * [x] Add intro sentence, e.g. the standard tagline for the package * [x] `usethis::use_tidy_thanks()` --> <p>We&rsquo;re chuffed to announce the release of <a href="https://usemodels.tidymodels.org/" target="_blank" rel="noopener">usemodels</a> 0.2.0. The usemodels package enables users to generate tidymodels code for fitting and tuning models. Given a) a formula and b) a data set, the <code>use_*()</code> functions (such as <code>use_glmnet()</code> and <code>use_xgboost()</code>) create code to fit that specific model to that data, including appropriate preprocessing.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;usemodels&#34;</span><span class="p">)</span> </code></pre></div><p>This blog post describes some new features. You can see a full list of changes in the <a href="https://usemodels.tidymodels.org/news/index.html" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">usemodels</span><span class="p">)</span> </code></pre></div> <h2 id="clipboard-access">Clipboard access <a href="#clipboard-access"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Each of the <code>use_*()</code> functions now has a <code>clipboard</code> feature that will send the new code to the clipboard, instead of writing to the console window.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">use_cubist</span><span class="p">(</span><span class="n">mpg</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">mtcars</span><span class="p">,</span> <span class="n">clipboard</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span> </code></pre></div><pre><code>## ✓ code is on the clipboard. </code></pre> <h2 id="new-models">New models <a href="#new-models"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>As requested in GitHub issues, support for <a href="https://www.rulequest.com/see5-unix.html" target="_blank" rel="noopener">C5.0</a> and <a href="https://en.wikipedia.org/wiki/Support-vector_machine" target="_blank" rel="noopener">SVM</a> models was added. SVM models require centering and scaling of the predictors, so the usemodel function provides this automatically:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">data</span><span class="p">(</span><span class="n">two_class_dat</span><span class="p">,</span> <span class="n">package</span> <span class="o">=</span> <span class="s">&#34;modeldata&#34;</span><span class="p">)</span> <span class="nf">use_kernlab_svm_rbf</span><span class="p">(</span><span class="n">Class</span> <span class="o">~</span> <span class="n">.,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">two_class_dat</span><span class="p">)</span> </code></pre></div><pre><code>## kernlab_recipe &lt;- ## recipe(formula = Class ~ ., data = two_class_dat) %&gt;% ## step_zv(all_predictors()) %&gt;% ## step_normalize(all_numeric_predictors()) ## ## kernlab_spec &lt;- ## svm_rbf(cost = tune(), rbf_sigma = tune()) %&gt;% ## set_mode(&quot;classification&quot;) ## ## kernlab_workflow &lt;- ## workflow() %&gt;% ## add_recipe(kernlab_recipe) %&gt;% ## add_model(kernlab_spec) ## ## set.seed(81161) ## kernlab_tune &lt;- ## tune_grid(kernlab_workflow, resamples = stop(&quot;add your rsample object&quot;), grid = stop(&quot;add number of candidate points&quot;)) </code></pre><p>Let us know if there are other features that would be interesting for the package on its GitHub <a href="https://github.com/tidymodels/usemodels/issues" target="_blank" rel="noopener">issues page</a>.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to all the people who contributed to usemodels since <a href="https://www.tidyverse.org/blog/2020/09/usemodels-0-0-1/" target="_blank" rel="noopener">our last blog post</a>: <a href="https://github.com/amazongodman" target="_blank" rel="noopener">@amazongodman</a>, <a href="https://github.com/brshallo" target="_blank" rel="noopener">@brshallo</a>, <a href="https://github.com/bryceroney" target="_blank" rel="noopener">@bryceroney</a>, <a href="https://github.com/czeildi" target="_blank" rel="noopener">@czeildi</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jennybc" target="_blank" rel="noopener">@jennybc</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/larry77" target="_blank" rel="noopener">@larry77</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>.</p> ragg, svglite, and the new graphics features https://www.tidyverse.org/blog/2022/02/new-graphic-features/ Fri, 25 Feb 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/02/new-graphic-features/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The release of <a href="https://ragg.r-lib.org" target="_blank" rel="noopener">ragg 1.2</a> and <a href="https://svglite.r-lib.org" target="_blank" rel="noopener">svglite 2.1</a> brought support for some exciting new graphics engine features, including gradients and patterns, which were <a href="https://developer.r-project.org/Blog/public/2020/07/15/new-features-in-the-r-graphics-engine/" target="_blank" rel="noopener">added in R 4.1</a> by Paul Murrell from R Core. This post will dive into these new features, as well as discuss what the future might hold for the R graphics engine.</p> <p>If you want to follow along on your own computer, you can install the latest versions of ragg and svglite from CRAN</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"ragg"</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"svglite"</span><span class='o'>)</span></code></pre> </div> <p>This post will rarely make any specific call-outs to ragg or svglite, as these are simply the packages that facilitate what is now possible with R graphics.</p> <h2 id="what-is-the-graphics-engine">What is the graphics engine? <a href="#what-is-the-graphics-engine"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>You might wonder what is meant by the <em>R graphics engine</em>. It&rsquo;s pretty deep in the R graphics stack, so as a user, you are unlikely to ever engage with it directly. But, since I somehow caught your attention, we might as well indulge in the finer points of the graphics implementation.</p> <p>While you may mainly be familiar with ggplot2 and perhaps a variety of graphics devices (e.g.  <a href="https://rdrr.io/r/grDevices/pdf.html" target="_blank" rel="noopener"><code>pdf()</code></a> or <a href="https://rdrr.io/r/grDevices/png.html" target="_blank" rel="noopener"><code>png()</code></a>), they sit at opposite ends of a fairly elaborate graphics pipeline. ggplot2 is a high(er) level plotting package that allows you to express your data-visualization intent through a structured grammar. Graphics devices such as ragg and svglite are low-level packages that translate simple graphics instructions into a given file format. In between these two poles we have a two additional abstractions that helps translate between the extremes. In very broad terms, the R graphic stack looks like this:</p> <p><img src="pipeline.png" alt="An overview of the different steps in the R graphics pipeline. A graphics package is build on top of a graphic system. All graphic systems calls into the same shared graphics engine which then relay graphic instructions to the active graphic device." title="graphics pipeline"></p> <h3 id="graphic-systems">Graphic systems <a href="#graphic-systems"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>R currently sports two different systems, one colloquially known as <em>base</em> graphics (implemented in the graphics package), and one called <em>grid</em> graphics (implemented in the grid package). If you call <a href="https://rdrr.io/r/graphics/plot.default.html" target="_blank" rel="noopener"><code>plot()</code></a> you are most likely to end up using the base graphics system, while grid is used as the basis for e.g. ggplot2. The two systems are largely silos, though effort has been made to allow users to embed base graphics into grid graphics. In RStudio we are mainly invested in the grid graphics system since it powers ggplot2, but by and large, this is all an implementation detail that the user shouldn&rsquo;t care too much about. There might come other graphic systems in the future, and other ways of drawing things on screen or to files also exist outside of the R graphics pipeline (e.g. rgl which allows you to create 3D graphics using an OpenGL/WebGL interface).</p> <h3 id="the-graphics-engine">The graphics engine <a href="#the-graphics-engine"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>What unites base and grid graphics and sets them apart from e.g. rgl is that they both call into the same low-level C API provided by R, the <strong>graphics engine</strong>. The graphics engine is responsible for communicating with the graphics devices while also providing selected utility functionality common for both base and grid graphics. It is because of this abstraction that creating graphics in R is largely decoupled from how it is outputted, be it on screen, in a file, or directly to a printer.</p> <p>While it sounds nice and neat when it is all laid out like this, the current structure and division has grown out over many years, and the boundaries between the graphic systems, the graphics engine, and the graphic devices are blurry. Still, the design is much more mature than what we see in other languages, and as graphics/data-viz developers in R we are pretty spoiled compared to our peers in other languages &mdash; perhaps without really knowing it.</p> <h2 id="a-fragmented-future">A fragmented future <a href="#a-fragmented-future"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>With the division of responsibility described above, there are many points in the pipeline that may impose limitations in functionality. The graphic system might not provide the higher-level API to use functionality in the graphics engine, or a graphic device might not provide support for instructions coming in from the graphics engine. While this has mainly been a hypothetical situation prior to R 4.1, it is now the new reality. The new features in the graphics engine were implemented along with high-level support in grid, and low-level support in the pdf device along with the cairo-based devices. This leaves base plot in the dark, and also excludes a range of built-in devices, including the default devices on Windows and macOS. At this point, where high-level support from e.g. ggplot2 is still not present, it might not be a big problem, as you will probably use these features quite deliberately and know their limitations in support. In the future, however, this could lead to surprises.</p> <p>As users, this fragmentation is most apparent in the choice of graphic device. After all, you don&rsquo;t expect the graphic system to be capable of something outside of its API, simply because new features were announced for the graphics device. However, if a graphic device lacks support it will simply not use the new features, and you may end up surprised at what it renders.</p> <p>When it comes to graphic systems, you can expect that grid will be the first (perhaps only) system that ends up supporting new features in the graphics engine. Part of the reason for that is that the grid API is more powerful in general, and, as new and more complex graphic powers are exposed, it can be easiest to make them fit into the most expressive API. This is definitely the case for the latest batch of new features, but I also expect it to be the case going forward. Just because a functionality is exposed in grid, doesn&rsquo;t mean that it can easily be handled in e.g. ggplot2. I&rsquo;ll address what the new features may mean for the future of ggplot2 at the end of the post.</p> <p>For graphic <em>devices</em> the water is more muddled. Not all devices are under active development, and such devices are unlikely to add support for new features. Further, it may be that a graphics device writes to a format, or uses a library that does not support a new feature provided by the graphics engine. The bottom line is that we can expect an increased fragmentation of the graphics devices in R in terms of which will be up to spec with the latest graphics engine features. It appears that the cairo-based devices along with the pdf device from grDevices can be expected to stay current. On our end, we will do our best to make sure that our graphic-device packages (currently ragg and svglite) will stay on top of any new additions to the graphics engine.</p> <h2 id="the-new-features">The new features <a href="#the-new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>OK, so we&rsquo;ve talked a lot about some new features without ever going into detail about what they are. If you&rsquo;ve never felt constrained by the capabilities of the graphics in R, you may be forgiven for thinking that this is all a big fuss over nothing. You may be right, but new capabilities will often allow the ecosystem to evolve in new and unexpected ways to the benefit of all.</p> <h3 id="gradients">Gradients <a href="#gradients"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>While gradients have been a part of R graphics for a while, they have always relied on some hack - most often cutting the line or polygon in smaller bits and coloring these with color sampled from a gradient. However, now gradients are supported at the device level, meaning that the pixel color is calculated based on a gradient function. This means that the gradient is pixel-perfect at any resolution, and if you are writing to vector format (e.g. svg), you can reduce the file size by not having to write the coordinates for a chopped-up polygon to support the gradient. For now, the functionality is limited to fills. So, if you want to draw a gradient line, you still have to cut it up into small segments.</p> <p>Gradients can be created with the <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>linearGradient()</code></a> and <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>radialGradient()</code></a> which can be assigned to the fill of a grob:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'>grid</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"firebrick"</span>, <span class='s'>"steelblue"</span>, <span class='s'>"forestgreen"</span><span class='o'>)</span>, stops <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0</span>, <span class='m'>0.7</span>, <span class='m'>1</span><span class='o'>)</span> <span class='o'>)</span>, col <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-3-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>radialGradient</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"white"</span>, <span class='s'>"steelblue"</span><span class='o'>)</span>, cx1 <span class='o'>=</span> <span class='m'>0.8</span>, cy1 <span class='o'>=</span> <span class='m'>0.8</span> <span class='o'>)</span>, col <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-4-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>At the basic level, both of the constructors takes a vector of colors. Optionally, you can provide a vector of stops that define where along the span of the gradient each color is placed. Each gradient type also lets you specify where in the graphic the gradient runs between. For a linear gradient, you provide the x and y position of the start and end of the gradient. For a radial gradient, you provide the center and radius of the start and end circle. Lastly, you can also tell it how it should behave outside of the given range using the <code>extend</code> argument:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"firebrick"</span>, <span class='s'>"steelblue"</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='m'>0</span>, y1 <span class='o'>=</span> <span class='m'>0</span>, x2 <span class='o'>=</span> <span class='m'>0.5</span>, y2 <span class='o'>=</span> <span class='m'>0</span>, extend <span class='o'>=</span> <span class='s'>"repeat"</span> <span class='o'>)</span>, col <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-5-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span> fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"firebrick"</span>, <span class='s'>"steelblue"</span><span class='o'>)</span>, x1 <span class='o'>=</span> <span class='m'>0</span>, y1 <span class='o'>=</span> <span class='m'>0</span>, x2 <span class='o'>=</span> <span class='m'>0.5</span>, y2 <span class='o'>=</span> <span class='m'>0</span>, extend <span class='o'>=</span> <span class='s'>"pad"</span> <span class='o'>)</span>, col <span class='o'>=</span> <span class='kc'>NA</span> <span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-6-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>One thing to note is that the extent of the gradient is given relative to the bounding box of the grob being drawn. We could move the circle above around and the gradient would follow along with it.</p> <h3 id="patterns">Patterns <a href="#patterns"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Like gradients, patterns are a new type of fill made possible in grid through the new features in the graphic engine. Patterns are crazy powerful in that they can be <em>any</em> grob you can imagine. The grob itself can consist of other grobs and these grobs could have patterned fill as well (or gradient fills for that matter).</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>gradient_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>rectGrob</a></span><span class='o'>(</span> width <span class='o'>=</span> <span class='m'>0.1</span>, height <span class='o'>=</span> <span class='m'>0.1</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"firebrick"</span>, <span class='s'>"steelblue"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.draw.html'>grid.draw</a></span><span class='o'>(</span><span class='nv'>gradient_rec</span><span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-7-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>pattern</a></span><span class='o'>(</span><span class='nv'>gradient_rec</span>, width <span class='o'>=</span> <span class='m'>0.15</span>, height <span class='o'>=</span> <span class='m'>0.15</span>, extend <span class='o'>=</span> <span class='s'>"reflect"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-8-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Understanding the sizing of the grob used for the pattern can take some getting used to. Basically, the pattern is drawn relative to the grob. The <code>width</code> and <code>height</code> arguments in the <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>pattern()</code></a> call is then used to define a region of the grob that will be used as a pattern. Thus, you cannot scale the pattern grob using the <code>width</code> and <code>height</code> arguments in <a href="https://rdrr.io/r/grid/patterns.html" target="_blank" rel="noopener"><code>pattern()</code></a>. This can be seen below</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>pattern</a></span><span class='o'>(</span><span class='nv'>gradient_rec</span>, width <span class='o'>=</span> <span class='m'>0.35</span>, height <span class='o'>=</span> <span class='m'>0.35</span>, extend <span class='o'>=</span> <span class='s'>"reflect"</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-9-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As we see, we are just defining a larger (empty) region from our rect grob, effectively adding more space between each rectangle, rather than creating larger rectangles. This also means that the pattern scales with the grob:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>pat</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>pattern</a></span><span class='o'>(</span> <span class='nv'>gradient_rec</span>, width <span class='o'>=</span> <span class='m'>0.15</span>, height <span class='o'>=</span> <span class='m'>0.15</span>, extend <span class='o'>=</span> <span class='s'>"reflect"</span> <span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='m'>0.25</span>, r <span class='o'>=</span> <span class='m'>0.25</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>pat</span><span class='o'>)</span> <span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='m'>0.75</span>, r <span class='o'>=</span> <span class='m'>0.5</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>pat</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-10-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>In order to ensure the same scale of pattern is used across separate grobs, be sure to use absolute units when defining the pattern grob as well as the region:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>gradient_rec</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>rectGrob</a></span><span class='o'>(</span> width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='s'>"cm"</span><span class='o'>)</span>, height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='s'>"cm"</span><span class='o'>)</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>linearGradient</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"firebrick"</span>, <span class='s'>"steelblue"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>pat</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/patterns.html'>pattern</a></span><span class='o'>(</span> <span class='nv'>gradient_rec</span>, width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>1.5</span>, <span class='s'>"cm"</span><span class='o'>)</span>, height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>1.5</span>, <span class='s'>"cm"</span><span class='o'>)</span>, extend <span class='o'>=</span> <span class='s'>"reflect"</span> <span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='m'>0.25</span>, r <span class='o'>=</span> <span class='m'>0.25</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>pat</span><span class='o'>)</span> <span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>grid.circle</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='m'>0.75</span>, r <span class='o'>=</span> <span class='m'>0.5</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nv'>pat</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-11-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As you can see, patterns can take some getting used to, but this is mainly because the API covers such a large span of functionality in terms of sizing, etc.</p> <h3 id="arbitrary-clipping-paths">Arbitrary clipping paths <a href="#arbitrary-clipping-paths"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Clipping is an integral part of graphics. You set up a region in your canvas and only this region will get drawn to. Up until now, the graphics engine has supported clipping, but only of rectangular, axis-aligned regions. Now, however, any grob can be used to define a clipping region. This is done at the viewport level, where the <code>clip</code> argument now can take a grob in addition to the standard <code>&quot;on&quot;</code>/<code>&quot;off&quot;</code> values.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>clip_path</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.text.html'>textGrob</a></span><span class='o'>(</span><span class='s'>"Clipping"</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fontface <span class='o'>=</span> <span class='s'>"bold"</span>, fontsize <span class='o'>=</span> <span class='m'>100</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.points.html'>grid.points</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>5000</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>5000</span><span class='o'>)</span>, default.units <span class='o'>=</span> <span class='s'>'npc'</span>, vp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/viewport.html'>viewport</a></span><span class='o'>(</span>clip <span class='o'>=</span> <span class='nv'>clip_path</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-12-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>Clipping is not only possible with single grobs. By combining grobs in a gList, you can making the clipping region arbitrarily complex:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>circle</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>circleGrob</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.2</span>, <span class='m'>0.8</span><span class='o'>)</span>, r <span class='o'>=</span> <span class='m'>0.3</span><span class='o'>)</span> <span class='nv'>rect</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>rectGrob</a></span><span class='o'>(</span> width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>0.7</span>, <span class='s'>'snpc'</span><span class='o'>)</span>, height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>0.7</span>, <span class='s'>'snpc'</span><span class='o'>)</span>, vp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/viewport.html'>viewport</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>45</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>clip_path</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.grob.html'>gTree</a></span><span class='o'>(</span>children <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.grob.html'>gList</a></span><span class='o'>(</span><span class='nv'>circle</span>, <span class='nv'>rect</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.points.html'>grid.points</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>5000</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/Uniform.html'>runif</a></span><span class='o'>(</span><span class='m'>5000</span><span class='o'>)</span>, default.units <span class='o'>=</span> <span class='s'>'npc'</span>, vp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/viewport.html'>viewport</a></span><span class='o'>(</span>clip <span class='o'>=</span> <span class='nv'>clip_path</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-13-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>The examples above seems quite contrived and decoupled from data visualization, but there are of course real world usages, e.g. clipping a 2D density estimate to the shape of a country or clipping data points inside a circular canvas for polar plots.</p> <p>The user interface for clipping paths is easy enough to understand, but it should be noted that there may be slight differences between devices as to which grob types can be used. Most notably, the use of text grobs for defining clipping paths is not something that will work for every device (but does work in ragg).</p> <h3 id="alpha-masks">Alpha masks <a href="#alpha-masks"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The last feature added to the graphics engine in this round is the ability of viewports to have an alpha mask assigned. When a mask is present, the grobs being drawn will apply the opacity of the mask. Note that this is different than a luminosity mask, which uses the lightness of the mask as the alpha value. A mask can be any grob you want, or a collection of multiples:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>circle</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.circle.html'>circleGrob</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.2</span>, <span class='m'>0.8</span><span class='o'>)</span>, r <span class='o'>=</span> <span class='m'>0.3</span> <span class='o'>)</span> <span class='nv'>rect</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>rectGrob</a></span><span class='o'>(</span> width <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>0.7</span>, <span class='s'>'snpc'</span><span class='o'>)</span>, height <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/unit.html'>unit</a></span><span class='o'>(</span><span class='m'>0.7</span>, <span class='s'>'snpc'</span><span class='o'>)</span>, vp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/viewport.html'>viewport</a></span><span class='o'>(</span>angle <span class='o'>=</span> <span class='m'>45</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>mask</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.grob.html'>gTree</a></span><span class='o'>(</span>children <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.grob.html'>gList</a></span><span class='o'>(</span><span class='nv'>circle</span>, <span class='nv'>rect</span><span class='o'>)</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='s'>"#00000066"</span>, col <span class='o'>=</span> <span class='kc'>NA</span><span class='o'>)</span><span class='o'>)</span> <span class='nf'><a href='https://rdrr.io/r/grid/grid.rect.html'>grid.rect</a></span><span class='o'>(</span> x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.25</span>, <span class='m'>0.25</span>, <span class='m'>0.75</span>, <span class='m'>0.75</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>0.25</span>, <span class='m'>0.75</span>, <span class='m'>0.75</span>, <span class='m'>0.25</span><span class='o'>)</span>, width <span class='o'>=</span> <span class='m'>0.5</span>, height <span class='o'>=</span> <span class='m'>0.5</span>, gp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/gpar.html'>gpar</a></span><span class='o'>(</span>fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"steelblue"</span>, <span class='s'>"firebrick"</span><span class='o'>)</span><span class='o'>)</span>, vp <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/grid/viewport.html'>viewport</a></span><span class='o'>(</span>mask <span class='o'>=</span> <span class='nv'>mask</span><span class='o'>)</span> <span class='o'>)</span> </code></pre> <p><img src="figs/unnamed-chunk-14-1.png" width="700px" style="display: block; margin: auto;" /></p> </div> <p>As we see above, the areas in the mask where nothing is drawn have an opacity of 0, meaning that whatever is being drawn by the rectangle grob in these areas will be invisible. We also see that opacity is compounded by overlaying shapes as the areas covered both by the circle and the square in the mask has a higher opacity.</p> <h2 id="future-features">Future features <a href="#future-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>It is, of course, dangerous to promise what the future brings. However, I do know of a few features adjacent to what has been discussed above that might make sense to know about.</p> <p>When it comes to clipping paths, there is currently a lack of way to describe how multiple shapes are combined, since the fill rule is implicitly &ldquo;winding&rdquo; and you have no control over the direction the graphic device trace circles and rectangles. Work is already being done to let you control this from grid, so it will become easier to e.g. punch out holes in a clipping path by overlaying two grobs.</p> <p>As noted in the discussion about masks, only alpha masks are currently possible. However, producing an exact transparency through compounded shapes can be tough because of the way opacity combines. In the future there will also be support for luminosity masks and this should greatly improve the user experience of this feature in my opinion.</p> <p>Still, the main takeaway from all of the above is that the graphic engine is once again a living breathing code-base with big user-facing features on the horizon.</p> <h2 id="the-ggplot2-implications">The ggplot2 implications <a href="#the-ggplot2-implications"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While ggplot2 uses grid underneath it&rsquo;s grammar API, these features are generally not directly available in ggplot2. This is because most of these features are not directly applicable to the current API. Both gradients and patterns are obvious candidates for extensions of the ggplot2 API. But, for now, the grid API doesn&rsquo;t support a vector of patterns/gradients. Once this limitation is removed (it is in the works), we will need to figure out how scaling of these more flexible fill types should work. The starting point is, of course, to allow mapping from one data-value to a predefined pattern/gradient, but it would be interesting to think about how to map data-values to features of the pattern/gradient, e.g. have the gradient defined by two or more columns that all maps to different colors. Some of this work and exploration is already happening in <a href="https://coolbutuseless.github.io/package/ggpattern" target="_blank" rel="noopener">ggpattern</a>, which could form the basis of future ggplot2 support.</p> <p>As for path clipping, we could imagine that geoms could take a clipping grob, but it is not obvious how this grob should be constructed in a manner consistent with the grammar. The same goes for masks. Maybe most of this work should be relegated to <a href="https://ggfx.data-imaginist.com" target="_blank" rel="noopener">ggfx</a> which has an extended API that seems better suited to masks and arbitrary clipping.</p> <h2 id="acknowledgement">Acknowledgement <a href="#acknowledgement"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>I&rsquo;d like to extend a huge thanks to Paul Murrell for continuing to support and improve the graphics API in R, and for his willingness to answer questions during the implementation of the new features in ragg and svglite. The new graphics engine features were joint work with Paul Murrell, partly sponsored by RStudio.</p> recipes 0.2.0 https://www.tidyverse.org/blog/2022/02/recipes-0-2-0/ Tue, 22 Feb 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/02/recipes-0-2-0/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>We&rsquo;re very excited to announce the release of <a href="https://recipes.tidymodels.org/" target="_blank" rel="noopener">recipes</a> 0.2.0. recipes is a package for preprocessing data before using it in models or visualizations. You can think of it as a mash-up of <code>model.matrix()</code> and dplyr.</p> <p>You can install it from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="s">&#34;recipes&#34;</span><span class="p">)</span> </code></pre></div><p>This blog post will describe the highlights of what&rsquo;s new. You can see a full list of changes in the <a href="https://github.com/tidymodels/recipes/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <h2 id="new-steps">New Steps <a href="#new-steps"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><code>step_nnmf_sparse()</code> was added to produce features using non-negative matrix factorization (via the <a href="https://github.com/zdebruine/RcppML" target="_blank" rel="noopener">RcppML</a> package). This will supersede the existing <code>step_nnmf()</code> since that step was difficult to support and use. The new step allows for a sparse representation via regularization and, from our initial testing, is <strong>much faster</strong> than the original NNMF step.</p> <p>The new step <code>step_dummy_extract()</code> helps create indicator variables from text data, especially those with multiple choice values. For example, if a row of a variable had a value of <code>&quot;red,black,brown&quot;</code>, the step can separate these values and make all of the required binary dummy variables.</p> <p>Here&rsquo;s a real example from <a href="https://www.kaggle.com/c/sliced-s01e08-KJSEks" target="_blank" rel="noopener">Episode 8 of <em>Sliced</em></a> where a column of data from Spotify had the artist(s) of a song:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">recipes</span><span class="p">)</span> <span class="n">spotify</span> <span class="o">&lt;-</span> <span class="n">tibble</span><span class="o">::</span><span class="nf">tribble</span><span class="p">(</span> <span class="o">~</span> <span class="n">artists</span><span class="p">,</span> <span class="s">&#34;[&#39;Genesis&#39;]&#34;</span><span class="p">,</span> <span class="s">&#34;[&#39;Billie Holiday&#39;, &#39;Teddy Wilson&#39;]&#34;</span><span class="p">,</span> <span class="s">&#34;[&#39;Jimmy Barnes&#39;, &#39;INXS&#39;]&#34;</span> <span class="p">)</span> <span class="nf">recipe</span><span class="p">(</span><span class="o">~</span> <span class="n">artists</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">spotify</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">step_dummy_extract</span><span class="p">(</span><span class="n">artists</span><span class="p">,</span> <span class="n">pattern</span> <span class="o">=</span> <span class="s">&#34;(?&lt;=&#39;)[^&#39;,]+(?=&#39;)&#34;</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">prep</span><span class="p">()</span> <span class="o">%&gt;%</span> <span class="nf">bake</span><span class="p">(</span><span class="n">new_data</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> <span class="o">%&gt;%</span> <span class="nf">glimpse</span><span class="p">()</span> </code></pre></div><pre><code>## Rows: 3 ## Columns: 6 ## $ artists_Billie.Holiday &lt;dbl&gt; 0, 1, 0 ## $ artists_Genesis &lt;dbl&gt; 1, 0, 0 ## $ artists_INXS &lt;dbl&gt; 0, 0, 1 ## $ artists_Jimmy.Barnes &lt;dbl&gt; 0, 0, 1 ## $ artists_Teddy.Wilson &lt;dbl&gt; 0, 1, 0 ## $ artists_other &lt;dbl&gt; 0, 0, 0 </code></pre><p>Note that this step produces an &ldquo;other&rdquo; column and has arguments similar to <code>step_other()</code> and <code>step_dummy_multi_choice()</code>.</p> <p><code>step_percentile()</code> is a new step function after it had previously only been an example in the developer documentation. It can determine the empirical distribution of a variable using the training set, then convert any value to the percentile of this distribution.</p> <p>Finally, a new filtering function (<code>step_filter_missing()</code>) can filter out columns that have too many missing values (for some definition of &ldquo;too many&rdquo;).</p> <h2 id="other-notable-new-features">Other notable new features <a href="#other-notable-new-features"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p><code>step_zv()</code> now has a <code>group</code> argument. This can be helpful for models such as naive Bayes or quadratic discriminant analysis where the predictors must have at least two unique values <em>within each class</em>.</p> <p>All recipe steps now officially support empty selections to be more aligned with dplyr and other packages that use tidyselect. For example, if a previous step removed all of the columns needed for a later step, the recipe does not fail when it is estimated (with the exception of <code>step_mutate()</code>). The documentation in <code>?selections</code> has been updated with advice for writing selectors when filtering steps are used.</p> <p>There are new <code>extract_parameter_set_dials()</code> and <code>extract_parameter_dials()</code> methods to extract parameter sets and single parameters from a recipe. Since this is related to tuning parameters, the tune package should be loaded before they are used.</p> <h2 id="breaking-changes">Breaking changes <a href="#breaking-changes"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Changes in <code>step_ica()</code> and <code>step_kpca*()</code> will now cause recipe objects from previous versions to error when applied to new data. You will need to update these recipes with the current version to be able to use them.</p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We&rsquo;d like to thank everyone that has contributed since the last release: <a href="https://github.com/agwalker82" target="_blank" rel="noopener">@agwalker82</a>, <a href="https://github.com/albert-ying" target="_blank" rel="noopener">@albert-ying</a>, <a href="https://github.com/AshesITR" target="_blank" rel="noopener">@AshesITR</a>, <a href="https://github.com/ddsjoberg" target="_blank" rel="noopener">@ddsjoberg</a>, <a href="https://github.com/DoktorMike" target="_blank" rel="noopener">@DoktorMike</a>, <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/emmansh" target="_blank" rel="noopener">@emmansh</a>, <a href="https://github.com/hermandr" target="_blank" rel="noopener">@hermandr</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, <a href="https://github.com/jacekkotowski" target="_blank" rel="noopener">@jacekkotowski</a>, <a href="https://github.com/JensPMB" target="_blank" rel="noopener">@JensPMB</a>, <a href="https://github.com/jkennel" target="_blank" rel="noopener">@jkennel</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/lg1000" target="_blank" rel="noopener">@lg1000</a>, <a href="https://github.com/lionel-" target="_blank" rel="noopener">@lionel-</a>, <a href="https://github.com/markjrieke" target="_blank" rel="noopener">@markjrieke</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/ninohardt" target="_blank" rel="noopener">@ninohardt</a>, <a href="https://github.com/SewerynGrodny" target="_blank" rel="noopener">@SewerynGrodny</a>, <a href="https://github.com/SimonCoulombe" target="_blank" rel="noopener">@SimonCoulombe</a>, <a href="https://github.com/spsanderson" target="_blank" rel="noopener">@spsanderson</a>, <a href="https://github.com/tedmoorman" target="_blank" rel="noopener">@tedmoorman</a>, <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a>, <a href="https://github.com/tsengj" target="_blank" rel="noopener">@tsengj</a>, <a href="https://github.com/walrossker" target="_blank" rel="noopener">@walrossker</a>, <a href="https://github.com/williamshell" target="_blank" rel="noopener">@williamshell</a>, and <a href="https://github.com/xiaoxi-david" target="_blank" rel="noopener">@xiaoxi-david</a>.</p> Upgrading to testthat edition 3 https://www.tidyverse.org/blog/2022/02/upkeep-testthat-3/ Tue, 22 Feb 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/02/upkeep-testthat-3/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [ ] ~Add intro sentence, e.g. the standard tagline for the package~ * [ ] ~[`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html)~ --> <p>As the collection of packages in the tidyverse grows, maintenance becomes increasingly important, and Hadley made this the topic of his <a href="https://www.rstudio.com/resources/rstudioglobal-2021/maintaining-the-house-the-tidyverse-built/" target="_blank" rel="noopener">keynote at rstudio::global 2021</a>.</p> <p>In this blog post, I discuss my process for a recent maintenance task, upgrading package tests to use the third edition of testthat.</p> <h2 id="testthat-3e">testthat 3e <a href="#testthat-3e"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The testthat package introduced the idea of an &ldquo;edition&rdquo; in version 3.0.0:</p> <blockquote> <p>An edition is a bundle of behaviours that you have to explicitly choose to use, allowing us to make otherwise backward incompatible changes.</p> </blockquote> <p>If you haven&rsquo;t heard of testthat 3e yet, the <a href="https://testthat.r-lib.org/articles/third-edition.html" target="_blank" rel="noopener">testthat article introducing the 3rd edition</a> is a great place to start. It outlines all the changes this edition brings.</p> <p>While you can continue to use testthat&rsquo;s previous behaviour, it&rsquo;s a good idea to upgrade so that you can make use of handy new features. As some of the changes may break your tests, you might have been putting that off, though. You would not be alone in that! Several tidymodels packages still have to make the jump, but I recently upgraded <a href="https://github.com/tidymodels/dials/" target="_blank" rel="noopener">dials</a> and <a href="https://github.com/tidymodels/censored/" target="_blank" rel="noopener">censored</a> to testthat edition 3. Here is what I did and learned along the way.</p> <h3 id="workflow-to-upgrade">Workflow to upgrade <a href="#workflow-to-upgrade"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The testthat article tells you how you can opt in to the new edition, and about major changes: deprecations, how messages and warnings are handled, and how comparisons of objects are made.</p> <p>The main guidance for a workflow is:</p> <ol> <li>Activate edition 3.</li> <li>Remove or replace deprecated functions.</li> <li>If your output got noisy, quiet things down as needed.</li> <li>Think about what it means if things are not &ldquo;all equal&rdquo; anymore.</li> </ol> <h3 id="activation-">Activation 🚀 <a href="#activation-"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>To activate you need to do two things in the DESCRIPTION - or you can let <code>usethis::use_testthat(3)</code> do it for you:</p> <ul> <li>Increase the testthat version to <code>&gt;= 3.0.0</code>.</li> <li>Set the <code>Config/testthat/edition</code> field to <code>3</code>.</li> </ul> <h3 id="moving-on-from-deprecations-">Moving on from deprecations ✨ <a href="#moving-on-from-deprecations-"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The article on testthat 3e contains a <a href="https://testthat.r-lib.org/articles/third-edition.html#deprecations" target="_blank" rel="noopener">list of deprecated functions</a> together with their successors. You can work your way through it, searching for the deprecated function and then replacing it with the most suitable alternative. The first one in that list is <code>context()</code> as testthat will use the file name instead, ensuring that context and file name are in sync. As such, <code>context()</code> does not have a replacement. My first <a href="https://github.com/tidymodels/censored/pull/142" target="_blank" rel="noopener">commit</a> after activating the third edition was to remove all calls to <code>context()</code>, followed by replacing other deprecated functions and arguments.</p> <p><img src="commits.png" alt="A list of commits starting with &ldquo;require testthat 3e, followed by removing context() and other deprecated functions&rdquo;"></p> <h3 id="warnings-and-messages-">Warnings and messages 🤫 <a href="#warnings-and-messages-"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>testthat edition 3 handles warnings and messages differently than edition 2: <code>expect_warning()</code> captures at most one warning, so if your code generates multiple warnings, they will bubble up now. Messages were previously silently ignored, now they also bubble up. That means the output may be a lot noisier after switching to edition 3. If the warnings or messages are important, you should explicitly capture them. Otherwise you can suppress them to clean up the output and make it easier to focus on what&rsquo;s important. Again, the testthat article has good examples for how to do either.</p> <h3 id="comparing-things--">Comparing things 🍎 🍊 <a href="#comparing-things--"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>The last big change from edition 2 to edition 3 that I want to mention is what is happening under the hood of <code>expect_equal()</code> and <code>expect_identical()</code>. Edition 3 uses <a href="https://waldo.r-lib.org/reference/compare.html" target="_blank" rel="noopener"><code>waldo::compare()</code></a> while edition 2 uses <a href="https://rdrr.io/r/base/all.equal.html" target="_blank" rel="noopener"><code>all.equal()</code></a>. For the most part, that meant changing the argument name from <code>tol</code> to <code>tolerance</code>, like in my third commit above.</p> <p>I did, however, run into a situation where a test newly failed. Those are the situations where general advice is hard because it depends so much on the context. In my case, I made use of the <code>ignore_function_env</code> and <code>ignore_formula_env</code> arguments to <a href="https://waldo.r-lib.org/reference/compare.html" target="_blank" rel="noopener"><code>waldo::compare()</code></a> to exclude those environments from the comparison. Those are probably useful to know about if you are upgrading a modelling package, but not particularly important otherwise. For dials and censored, that solved most of the cases. In one instance, I ended up tweaking the reference value based on theoretical considerations of the model I was dealing with rather than increasing the tolerance.</p> <p>Those instances may be the most work when upgrading to edition 3, but I did not encounter many of them &ndash; and, when I did, it was valuable to know about the differences (well, those which I didn&rsquo;t choose to ignore).</p> <h2 id="more-testing-made-easier">More testing made easier <a href="#more-testing-made-easier"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>While I was going over all the test files, I also decided to cover a few other aspects.</p> <h3 id="nested-expectations">Nested expectations <a href="#nested-expectations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">Davis Vaughan</a> moved other tidymodels packages to testthat 3e, I saw him disentangle nested expectations. For example, patterns like</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">expect_warning</span><span class="p">(</span><span class="nf">expect_equal</span><span class="p">(</span><span class="n">one_call</span><span class="p">,</span> <span class="n">another_call</span><span class="p">))</span> </code></pre></div><p>or</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">expect_equal</span><span class="p">(</span><span class="nf">expect_warning</span><span class="p">(</span><span class="n">one_call</span><span class="p">),</span> <span class="nf">expect_warning</span><span class="p">(</span><span class="n">another_call</span><span class="p">))</span> </code></pre></div><p>can be re-written as</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">expect_snapshot</span><span class="p">({</span> <span class="n">object_from_one_call</span> <span class="o">&lt;-</span> <span class="nf">one_call</span><span class="p">()</span> <span class="n">object_from_another_call</span> <span class="o">&lt;-</span> <span class="nf">another_call</span><span class="p">()</span> <span class="p">})</span> <span class="nf">expect_equal</span><span class="p">(</span><span class="n">object_from_one_call</span><span class="p">,</span> <span class="n">object_from_another_call</span><span class="p">)</span> </code></pre></div><p>This separates an expectation about the warnings from the expectation about the value, making it easier to see which part(s) fail. Snapshots can also be particularly helpful in situations where you are trying to test for a combination of warnings, messages, and/or errors because they cover them all.</p> <h3 id="self-contained-tests">Self-contained tests <a href="#self-contained-tests"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>I wanted to make the tests more self-contained so that a test could run with a single call to <code>test_that()</code>. Specifically, I didn&rsquo;t want to have to scroll back up to the top of the file to load any necessary package or find the code that creates helper objects.</p> <p>You can avoid the former by prefixing functions with the package they belong to, i.e. using <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>dplyr::mutate()</code></a> instead of <a href="https://dplyr.tidyverse.org" target="_blank" rel="noopener"><code>library(dplyr)</code></a> at the top of the file and later <code>mutate()</code> inside of the expression for <code>test_that()</code>.</p> <p>If creating a helper object is short, I might move the code inside of <code>test_that()</code>. If you create the same helper objects multiple times and don&rsquo;t want to see the code repeatedly, you can move it into a helper function. Files inside the <code>testthat</code> folder of your source code with file names starting with <code>helper</code> are executed before tests are run. You could put your helper code there but it is <a href="https://testthat.r-lib.org/reference/test_file.html#special-files" target="_blank" rel="noopener">recommended</a> to put the helper code in your <code>R/</code> folder, for example as <a href="https://testthat.r-lib.org/articles/custom-expectation.html" target="_blank" rel="noopener"><code>test-helpers.R</code></a>.</p> <p>An example helper function is called <code>make_test_model()</code>, which returns a list with training and testing data as well as the fitted model. A test on the prediction method could then look like this:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">test_that</span><span class="p">(</span><span class="s">&#34;prediction returns the correct number of records&#34;</span><span class="p">,</span> <span class="p">{</span> <span class="n">helper_objects</span> <span class="o">&lt;-</span> <span class="nf">make_test_model</span><span class="p">()</span> <span class="n">pred</span> <span class="o">&lt;-</span> <span class="nf">predict</span><span class="p">(</span><span class="n">helper_objects</span><span class="o">$</span><span class="n">model</span><span class="p">,</span> <span class="n">helper_objects</span><span class="o">$</span><span class="n">test_data</span><span class="p">)</span> <span class="nf">expect_equal</span><span class="p">(</span><span class="nf">nrow</span><span class="p">(</span><span class="n">pred</span><span class="p">),</span> <span class="nf">nrow</span><span class="p">(</span><span class="n">helper_objects</span><span class="o">$</span><span class="n">test_data</span><span class="p">))</span> <span class="p">})</span> </code></pre></div><p>Any other data objects needed for testing I moved into <code>tests/testthat/data/</code>.</p> <h3 id="corresponding-files-in-r-and-teststestthat">Corresponding files in <code>R/</code> and <code>tests/testthat/</code> <a href="#corresponding-files-in-r-and-teststestthat"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>If a file in <code>R/</code> had a corresponding file in <code>testthat/</code>, I made sure the names matched up, e.g., <code>monstera.R</code> and <code>test-monstera.R</code>.</p> <p>This gives you access to some convenient features of usethis and devtools:</p> <ul> <li>When you have the R file open, it&rsquo;s easy to open the corresponding test file with <a href="https://usethis.r-lib.org/reference/use_r.html" target="_blank" rel="noopener"><code>usethis::use_test()</code></a> - and vice versa with <a href="https://usethis.r-lib.org/reference/use_r.html" target="_blank" rel="noopener"><code>usethis::use_r()</code></a>. No clicking around needed!</li> <li>When you have either file open, you can run the tests with <a href="http://devtools.r-lib.org/reference/test.html" target="_blank" rel="noopener"><code>devtools:::test_active_file()</code></a>, and see the test coverage report with <code>test_coverage_active_file()</code> (which also shows you which lines are actually being tested). Both also have an RStudio addin, which means you can add <a href="https://rstudio.github.io/rstudioaddins/#keyboard-shorcuts" target="_blank" rel="noopener">keyboard shortcuts</a> for them!</li> </ul> <p>And, with that, dials and censored were ready for more snapshot tests in the future!</p> <p>For more guidance on implementing tidy standards, check out <a href="https://usethis.r-lib.org/reference/tidyverse.html" target="_blank" rel="noopener"><code>usethis::use_tidy_upkeep_issue()</code></a>. It creates a GitHub issue with a handy checklist. You will be seeing those popping up in our repositories soon when we do some spring cleaning!</p> tidyr 1.2.0 https://www.tidyverse.org/blog/2022/02/tidyr-1-2-0/ Tue, 01 Feb 2022 00:00:00 +0000 https://www.tidyverse.org/blog/2022/02/tidyr-1-2-0/ <p>We&rsquo;re chuffed to announce the release of <a href="https://tidyr.tidyverse.org" target="_blank" rel="noopener">tidyr</a> 1.2.0. tidyr provides a set of tools for transforming data frames to and from tidy data, where each variable is a column and each observation is a row. Tidy data is a convention for matching the semantics and structure of your data that makes using the rest of the tidyverse (and many other R packages) much easier.</p> <p>You can install it from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"tidyr"</span><span class='o'>)</span></code></pre> </div> <p>This blog post will go over the main new features, which include four new arguments to <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a>, the ability to unnest multiple columns at once in <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a> and <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a>, an enhanced <a href="https://tidyr.tidyverse.org/reference/complete.html" target="_blank" rel="noopener"><code>complete()</code></a> function, and some updates to our tools for handling missing values.</p> <p>You can see a full list of changes in the <a href="https://github.com/tidyverse/tidyr/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>, where you&rsquo;ll also find details on the ~50 bugs that were fixed in this release!</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span></code></pre> </div> <h2 id="new-author">New author <a href="#new-author"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>First off, we are very excited to welcome <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a> as a new tidyr author in recognition of his significant and sustained contributions. In particular, he played a large part in speeding up a number of core functions, including: <a href="https://tidyr.tidyverse.org/reference/chop.html" target="_blank" rel="noopener"><code>unchop()</code></a>, <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>unnest()</code></a>, <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a>, and <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a>. Additionally, he provided proof-of-concept implementations for a few new features, like the <code>unused_fn</code> argument to <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a> discussed below.</p> <h2 id="pivoting">Pivoting <a href="#pivoting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2> <h3 id="value-expansion">Value expansion <a href="#value-expansion"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p> <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a> has gained two new arguments related to the <em>expansion</em> of values. These arguments are similar to <code>drop = FALSE</code> from <a href="https://tidyr.tidyverse.org/reference/spread.html" target="_blank" rel="noopener"><code>spread()</code></a>, but are a bit more fine grained. As you&rsquo;ll see, these are mostly useful when you have factors in either <code>names_from</code> or <code>id_cols</code> and want to ensure that all of the factor levels are retained.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>weekdays</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Mon"</span>, <span class='s'>"Tue"</span>, <span class='s'>"Wed"</span>, <span class='s'>"Thu"</span>, <span class='s'>"Fri"</span>, <span class='s'>"Sat"</span>, <span class='s'>"Sun"</span><span class='o'>)</span> <span class='nv'>daily</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span> day <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/factor.html'>factor</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Tue"</span>, <span class='s'>"Thu"</span>, <span class='s'>"Fri"</span>, <span class='s'>"Mon"</span><span class='o'>)</span>, levels <span class='o'>=</span> <span class='nv'>weekdays</span><span class='o'>)</span>, value <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>2</span>, <span class='m'>3</span>, <span class='m'>1</span>, <span class='m'>5</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>daily</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 2</span></span> <span class='c'>#&gt; day value</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Tue 2</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Thu 3</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Fri 1</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Mon 5</span></code></pre> </div> <p>Imagine you&rsquo;d like to pivot the values from <code>day</code> into columns, filling the cells with <code>value</code>. By default, <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a> only generates columns from the data that is actually there, and will retain the ordering that was present in the data.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span><span class='nv'>daily</span>, names_from <span class='o'>=</span> <span class='nv'>day</span>, values_from <span class='o'>=</span> <span class='nv'>value</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 4</span></span> <span class='c'>#&gt; Tue Thu Fri Mon</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 2 3 1 5</span></code></pre> </div> <p>When you know the full set of possible values and have encoded them as factor levels (as we have done here), you might want to retain those levels in the pivot, even if there isn&rsquo;t any data. Additionally, it would probably be nice if they were sorted to match the levels found in the factor. The new <code>names_expand</code> argument handles both of these.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span><span class='nv'>daily</span>, names_from <span class='o'>=</span> <span class='nv'>day</span>, values_from <span class='o'>=</span> <span class='nv'>value</span>, names_expand <span class='o'>=</span> <span class='kc'>TRUE</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 7</span></span> <span class='c'>#&gt; Mon Tue Wed Thu Fri Sat Sun</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 5 2 <span style='color: #BB0000;'>NA</span> 3 1 <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span></span></code></pre> </div> <p>A related problem can occur when there are implicit missing factor levels in the <code>id_cols</code>. When this happens, there are missing rows (rather than columns) that you&rsquo;d like to explicitly represent. To demonstrate, we&rsquo;ll modify <code>daily</code> with a <code>type</code> column, and pivot on that instead, keeping <code>day</code> as an identifier column.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>daily</span> <span class='o'>&lt;-</span> <span class='nv'>daily</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>type <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"B"</span>, <span class='s'>"A"</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>daily</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Mon 5 A</span></code></pre> </div> <p>In the pivot below, we are missing some rows corresponding to the missing factor levels of <code>day</code>. Again, by default <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a> will only use data that already exists in the <code>id_cols</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>daily</span>, names_from <span class='o'>=</span> <span class='nv'>type</span>, values_from <span class='o'>=</span> <span class='nv'>value</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span> <span class='c'>#&gt; day A B</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Tue 2 <span style='color: #BB0000;'>NA</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Thu <span style='color: #BB0000;'>NA</span> 3</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Fri <span style='color: #BB0000;'>NA</span> 1</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Mon 5 <span style='color: #BB0000;'>NA</span></span></code></pre> </div> <p>To explicitly expand (and sort) these missing rows, we can use <code>id_expand</code>, which works much the same way as <code>names_expand</code>. We will also go ahead and fill the unrepresented values with zeros.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>daily</span>, id_expand <span class='o'>=</span> <span class='kc'>TRUE</span>, names_from <span class='o'>=</span> <span class='nv'>type</span>, values_from <span class='o'>=</span> <span class='nv'>value</span>, values_fill <span class='o'>=</span> <span class='m'>0</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 3</span></span> <span class='c'>#&gt; day A B</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Mon 5 0</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Tue 2 0</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wed 0 0</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Thu 0 3</span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Fri 0 1</span> <span class='c'>#&gt; <span style='color: #555555;'>6</span> Sat 0 0</span> <span class='c'>#&gt; <span style='color: #555555;'>7</span> Sun 0 0</span></code></pre> </div> <h3 id="varying-names">Varying names <a href="#varying-names"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>When you specify multiple <code>values_from</code> columns, the resulting column names that get generated from the combination of <code>names_from</code> values and <code>values_from</code> names default to varying the <code>names_from</code> values <em>fastest</em>. This means that all of the columns related to the first <code>values_from</code> column will be at the front, followed by the columns related to the second <code>values_from</code> column, and so on. For example, if we wanted to flatten <code>daily</code> all the way out to a single row by specifying <code>values_from = c(value, type)</code>, then we would end up with all the columns related to <code>value</code> followed by those related to <code>type</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>daily</span>, names_from <span class='o'>=</span> <span class='nv'>day</span>, values_from <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>value</span>, <span class='nv'>type</span><span class='o'>)</span>, names_expand <span class='o'>=</span> <span class='kc'>TRUE</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 14</span></span> <span class='c'>#&gt; value_Mon value_Tue value_Wed value_Thu value_Fri value_Sat value_Sun type_Mon</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 5 2 <span style='color: #BB0000;'>NA</span> 3 1 <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> A </span> <span class='c'>#&gt; <span style='color: #555555;'># … with 6 more variables: type_Tue &lt;chr&gt;, type_Wed &lt;chr&gt;, type_Thu &lt;chr&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># type_Fri &lt;chr&gt;, type_Sat &lt;chr&gt;, type_Sun &lt;chr&gt;</span></span></code></pre> </div> <p>Depending on your data, you might instead want to group all of the columns related to a particular <code>names_from</code> value together. In this example, that would mean grouping all of the columns related to Monday together, followed by Tuesday, Wednesday, etc. You can accomplish this with the new <code>names_vary</code> argument, which allows you to vary the <code>names_from</code> values <em>slowest</em>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>daily</span>, names_from <span class='o'>=</span> <span class='nv'>day</span>, values_from <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>value</span>, <span class='nv'>type</span><span class='o'>)</span>, names_expand <span class='o'>=</span> <span class='kc'>TRUE</span>, names_vary <span class='o'>=</span> <span class='s'>"slowest"</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 14</span></span> <span class='c'>#&gt; value_Mon type_Mon value_Tue type_Tue value_Wed type_Wed value_Thu type_Thu</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 5 A 2 A <span style='color: #BB0000;'>NA</span> <span style='color: #BB0000;'>NA</span> 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'># … with 6 more variables: value_Fri &lt;dbl&gt;, type_Fri &lt;chr&gt;, value_Sat &lt;dbl&gt;,</span></span> <span class='c'>#&gt; <span style='color: #555555;'># type_Sat &lt;chr&gt;, value_Sun &lt;dbl&gt;, type_Sun &lt;chr&gt;</span></span></code></pre> </div> <h3 id="unused-columns">Unused columns <a href="#unused-columns"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>Occasionally you&rsquo;ll find yourself in a situation where you have columns in your data that are unrelated to the pivoting process itself, but you&rsquo;d still like to retain some information about them. Consider this data set that records values returned by various systems across multiple counties.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>readouts</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span> county <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"Wake"</span>, <span class='s'>"Wake"</span>, <span class='s'>"Wake"</span>, <span class='s'>"Guilford"</span>, <span class='s'>"Guilford"</span><span class='o'>)</span>, date <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='o'>(</span><span class='s'>"2020-01-01"</span><span class='o'>)</span> <span class='o'>+</span> <span class='m'>0</span><span class='o'>:</span><span class='m'>2</span>, <span class='nf'><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></span><span class='o'>(</span><span class='s'>"2020-01-03"</span><span class='o'>)</span> <span class='o'>+</span> <span class='m'>0</span><span class='o'>:</span><span class='m'>1</span><span class='o'>)</span>, system <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span>, <span class='s'>"C"</span>, <span class='s'>"A"</span>, <span class='s'>"C"</span><span class='o'>)</span>, value <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>3.2</span>, <span class='m'>4</span>, <span class='m'>5.5</span>, <span class='m'>2</span>, <span class='m'>1.2</span><span class='o'>)</span> <span class='o'>)</span> <span class='nv'>readouts</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 4</span></span> <span class='c'>#&gt; county date system value</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake 2020-01-01 A 3.2</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Wake 2020-01-02 B 4 </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wake 2020-01-03 C 5.5</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Guilford 2020-01-03 A 2 </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Guilford 2020-01-04 C 1.2</span></code></pre> </div> <p>You might want to pivot this into a view containing one row per <code>county</code>, with the <code>system</code> types across the columns. You might do something like:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>readouts</span>, id_cols <span class='o'>=</span> <span class='nv'>county</span>, names_from <span class='o'>=</span> <span class='nv'>system</span>, values_from <span class='o'>=</span> <span class='nv'>value</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 4</span></span> <span class='c'>#&gt; county A B C</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake 3.2 4 5.5</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Guilford 2 <span style='color: #BB0000;'>NA</span> 1.2</span></code></pre> </div> <p>This worked, but in the process we&rsquo;ve lost all of the information from the <code>date</code> column about when the values were recorded. To fix this, we can use the new <code>unused_fn</code> argument to retain a summary of the unused <code>date</code> column. In our case, we&rsquo;ll retain the most recent date a value was recorded across all systems.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>readouts</span>, id_cols <span class='o'>=</span> <span class='nv'>county</span>, names_from <span class='o'>=</span> <span class='nv'>system</span>, values_from <span class='o'>=</span> <span class='nv'>value</span>, unused_fn <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>date <span class='o'>=</span> <span class='nv'>max</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 5</span></span> <span class='c'>#&gt; county A B C date </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake 3.2 4 5.5 2020-01-03</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Guilford 2 <span style='color: #BB0000;'>NA</span> 1.2 2020-01-04</span></code></pre> </div> <p>If you want to retain the unused columns but delay the summarization entirely, you can use <a href="https://rdrr.io/r/base/list.html" target="_blank" rel="noopener"><code>list()</code></a> to wrap up the value into a list column.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>readouts</span>, id_cols <span class='o'>=</span> <span class='nv'>county</span>, names_from <span class='o'>=</span> <span class='nv'>system</span>, values_from <span class='o'>=</span> <span class='nv'>value</span>, unused_fn <span class='o'>=</span> <span class='nv'>list</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 5</span></span> <span class='c'>#&gt; county A B C date </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake 3.2 4 5.5 <span style='color: #555555;'>&lt;date [3]&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Guilford 2 <span style='color: #BB0000;'>NA</span> 1.2 <span style='color: #555555;'>&lt;date [2]&gt;</span></span></code></pre> </div> <p>Note that for <code>unused_fn</code> to work, you must supply <code>id_cols</code> explicitly, as otherwise all of the remaining columns are assumed to be <code>id_cols</code>.</p> <h3 id="more-informative-errors">More informative errors <a href="#more-informative-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h3><p>We&rsquo;ve improved on a number of the error messages throughout tidyr, but the error you get from <a href="https://tidyr.tidyverse.org/reference/pivot_wider.html" target="_blank" rel="noopener"><code>pivot_wider()</code></a> when you encounter values that aren&rsquo;t uniquely identified is now especially nice. Let&rsquo;s &ldquo;accidentally&rdquo; add a duplicate row to <code>readouts</code>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>readouts2</span> <span class='o'>&lt;-</span> <span class='nv'>readouts</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/slice.html'>slice</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/seq.html'>seq_len</a></span><span class='o'>(</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/context.html'>n</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span>, <span class='nf'><a href='https://dplyr.tidyverse.org/reference/context.html'>n</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>readouts2</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 6 × 4</span></span> <span class='c'>#&gt; county date system value</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;date&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake 2020-01-01 A 3.2</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Wake 2020-01-02 B 4 </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wake 2020-01-03 C 5.5</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Guilford 2020-01-03 A 2 </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Guilford 2020-01-04 C 1.2</span> <span class='c'>#&gt; <span style='color: #555555;'>6</span> Guilford 2020-01-04 C 1.2</span></code></pre> </div> <p>Pivoting on <code>system</code> warns us that the values from <code>value</code> are not uniquely identified.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://tidyr.tidyverse.org/reference/pivot_wider.html'>pivot_wider</a></span><span class='o'>(</span> <span class='nv'>readouts2</span>, id_cols <span class='o'>=</span> <span class='nv'>county</span>, names_from <span class='o'>=</span> <span class='nv'>system</span>, values_from <span class='o'>=</span> <span class='nv'>value</span> <span class='o'>)</span> <span class='c'>#&gt; Warning: Values from `value` are not uniquely identified; output will contain list-cols.</span> <span class='c'>#&gt; * Use `values_fn = list` to suppress this warning.</span> <span class='c'>#&gt; * Use `values_fn = &#123;summary_fun&#125;` to summarise duplicates.</span> <span class='c'>#&gt; * Use the following dplyr code to identify duplicates.</span> <span class='c'>#&gt; &#123;data&#125; %&gt;%</span> <span class='c'>#&gt; dplyr::group_by(county, system) %&gt;%</span> <span class='c'>#&gt; dplyr::summarise(n = dplyr::n(), .groups = "drop") %&gt;%</span> <span class='c'>#&gt; dplyr::filter(n &gt; 1L)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 4</span></span> <span class='c'>#&gt; county A B C </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Wake <span style='color: #555555;'>&lt;dbl [1]&gt;</span> <span style='color: #555555;'>&lt;dbl [1]&gt;</span> <span style='color: #555555;'>&lt;dbl [1]&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Guilford <span style='color: #555555;'>&lt;dbl [1]&gt;</span> <span style='color: #555555;'>&lt;NULL&gt;</span> <span style='color: #555555;'>&lt;dbl [2]&gt;</span></span></code></pre> </div> <p>This provides us with a number of options, but the last one is particularly useful if we weren&rsquo;t expecting duplicates. This prints out a block of dplyr code that you can use to quickly identify duplication issues. Replacing <code>{data}</code> with <code>readouts2</code>, we get:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>readouts2</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>county</span>, <span class='nv'>system</span><span class='o'>)</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/summarise.html'>summarise</a></span><span class='o'>(</span>n <span class='o'>=</span> <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/context.html'>n</a></span><span class='o'>(</span><span class='o'>)</span>, .groups <span class='o'>=</span> <span class='s'>"drop"</span><span class='o'>)</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'>dplyr</span><span class='nf'>::</span><span class='nf'><a href='https://dplyr.tidyverse.org/reference/filter.html'>filter</a></span><span class='o'>(</span><span class='nv'>n</span> <span class='o'>&gt;</span> <span class='m'>1L</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 1 × 3</span></span> <span class='c'>#&gt; county system n</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Guilford C 2</span></code></pre> </div> <h2 id="unnesting">(Un)nesting <a href="#unnesting"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a> and <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_wider()</code></a> have both gained the ability to unnest multiple columns at once. This is particularly useful with <a href="https://tidyr.tidyverse.org/reference/hoist.html" target="_blank" rel="noopener"><code>unnest_longer()</code></a>, where sequential unnesting would instead result in a Cartesian product, which isn&rsquo;t typically desired.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span><span class='o'>)</span>, y <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>1</span><span class='o'>:</span><span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>df</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 2 × 2</span></span> <span class='c'>#&gt; x y </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;list&gt;</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> <span style='color: #555555;'>&lt;dbl [1]&gt;</span> <span style='color: #555555;'>&lt;dbl [1]&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #555555;'>&lt;int [2]&gt;</span> <span style='color: #555555;'>&lt;int [2]&gt;</span></span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Sequential unnesting</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/hoist.html'>unnest_longer</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/hoist.html'>unnest_longer</a></span><span class='o'>(</span><span class='nv'>y</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 2</span></span> <span class='c'>#&gt; x y</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 1</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 1 2</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> 2 1</span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> 2 2</span> <span class='c'># Joint unnesting</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/hoist.html'>unnest_longer</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nv'>y</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 2</span></span> <span class='c'>#&gt; x y</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 1 1</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 1 1</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 2 2</span></code></pre> </div> <h2 id="grids">Grids <a href="#grids"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>When <a href="https://tidyr.tidyverse.org/reference/complete.html" target="_blank" rel="noopener"><code>complete()</code></a>-ing a data frame, it&rsquo;s often useful to immediately fill the newly generated missing values with a value that better represents their intention. For example, with the <code>daily</code> data we could complete on the <code>day</code> factor column and insert zeros for <code>value</code> in any row that wasn&rsquo;t previously represented.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>daily</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Mon 5 A</span> <span class='nv'>daily</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/complete.html'>complete</a></span><span class='o'>(</span><span class='nv'>day</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>value <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Mon 5 A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wed 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>6</span> Sat 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>7</span> Sun 0 <span style='color: #BB0000;'>NA</span></span></code></pre> </div> <p>But what if there were already missing values before completing? By default, <a href="https://tidyr.tidyverse.org/reference/complete.html" target="_blank" rel="noopener"><code>complete()</code></a> will still fill those <em>explicit</em> missing values too.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>daily2</span> <span class='o'>&lt;-</span> <span class='nv'>daily</span> <span class='nv'>daily2</span><span class='o'>$</span><span class='nv'>value</span><span class='o'>[</span><span class='nf'><a href='https://rdrr.io/r/base/nrow.html'>nrow</a></span><span class='o'>(</span><span class='nv'>daily2</span><span class='o'>)</span><span class='o'>]</span> <span class='o'>&lt;-</span> <span class='kc'>NA</span> <span class='nv'>daily2</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Mon <span style='color: #BB0000;'>NA</span> A</span> <span class='nv'>daily2</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/complete.html'>complete</a></span><span class='o'>(</span><span class='nv'>day</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>value <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Mon 0 A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wed 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>6</span> Sat 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>7</span> Sun 0 <span style='color: #BB0000;'>NA</span></span></code></pre> </div> <p>To avoid this, you can now retain pre-existing explicit missing values with the new <code>explicit</code> argument:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>daily2</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/complete.html'>complete</a></span><span class='o'>(</span><span class='nv'>day</span>, fill <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>value <span class='o'>=</span> <span class='m'>0</span><span class='o'>)</span>, explicit <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 7 × 3</span></span> <span class='c'>#&gt; day value type </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;fct&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span> <span style='color: #555555; font-style: italic;'>&lt;chr&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> Mon <span style='color: #BB0000;'>NA</span> A </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> Tue 2 A </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> Wed 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> Thu 3 B </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> Fri 1 B </span> <span class='c'>#&gt; <span style='color: #555555;'>6</span> Sat 0 <span style='color: #BB0000;'>NA</span> </span> <span class='c'>#&gt; <span style='color: #555555;'>7</span> Sun 0 <span style='color: #BB0000;'>NA</span></span></code></pre> </div> <h2 id="missing-values">Missing values <a href="#missing-values"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The three core missing values functions, <a href="https://tidyr.tidyverse.org/reference/drop_na.html" target="_blank" rel="noopener"><code>drop_na()</code></a>, <a href="https://tidyr.tidyverse.org/reference/replace_na.html" target="_blank" rel="noopener"><code>replace_na()</code></a>, and <a href="https://tidyr.tidyverse.org/reference/fill.html" target="_blank" rel="noopener"><code>fill()</code></a>, have all been updated to utilize <a href="https://vctrs.r-lib.org" target="_blank" rel="noopener">vctrs</a>. This allows them to work properly with a wider variety of types, and makes them safer to use with some of the existing types that they already supported.</p> <p>As an example, <a href="https://tidyr.tidyverse.org/reference/fill.html" target="_blank" rel="noopener"><code>fill()</code></a> now works properly with the Period types from <a href="https://lubridate.tidyverse.org" target="_blank" rel="noopener">lubridate</a>:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://lubridate.tidyverse.org'>lubridate</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://lubridate.tidyverse.org/reference/period.html'>seconds</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='m'>2</span>, <span class='kc'>NA</span>, <span class='m'>4</span>, <span class='kc'>NA</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/fill.html'>fill</a></span><span class='o'>(</span><span class='nv'>x</span>, .direction <span class='o'>=</span> <span class='s'>"down"</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 5 × 1</span></span> <span class='c'>#&gt; x </span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;Period&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 1S </span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 2S </span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 2S </span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> 4S </span> <span class='c'>#&gt; <span style='color: #555555;'>5</span> 4S</span></code></pre> </div> <p>And it now treats <code>NaN</code> like any other missing value:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>NaN</span>, <span class='m'>2</span>, <span class='kc'>NA</span>, <span class='m'>3</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/fill.html'>fill</a></span><span class='o'>(</span><span class='nv'>x</span>, .direction <span class='o'>=</span> <span class='s'>"up"</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 4 × 1</span></span> <span class='c'>#&gt; x</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;dbl&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 2</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 2</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 3</span> <span class='c'>#&gt; <span style='color: #555555;'>4</span> 3</span></code></pre> </div> <p>The most drastic improvement in safety comes to <a href="https://tidyr.tidyverse.org/reference/replace_na.html" target="_blank" rel="noopener"><code>replace_na()</code></a>. Previously, this relied on <code>[&lt;-</code> to replace missing values with a replacement value, which is much laxer than vctrs in terms of what the replacement value can be. This resulted in the possibility for your column type to change depending on what your replacement value was.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Notice that this is an integer column</span> <span class='nv'>df</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://tibble.tidyverse.org/reference/tibble.html'>tibble</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1L</span>, <span class='kc'>NA</span>, <span class='m'>3L</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>df</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span> <span class='c'>#&gt; x</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 1</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> <span style='color: #BB0000;'>NA</span></span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 3</span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># Previous behavior without vctrs:</span> <span class='c'># Integer column changed to character column</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/replace_na.html'>replace_na</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='s'>"missing"</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; # A tibble: 3 × 1</span> <span class='c'>#&gt; x </span> <span class='c'>#&gt; &lt;chr&gt; </span> <span class='c'>#&gt; 1 1 </span> <span class='c'>#&gt; 2 missing</span> <span class='c'>#&gt; 3 3</span> <span class='c'># Integer column changed to double column</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/replace_na.html'>replace_na</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; # A tibble: 3 × 1</span> <span class='c'>#&gt; x</span> <span class='c'>#&gt; &lt;dbl&gt;</span> <span class='c'>#&gt; 1 1</span> <span class='c'>#&gt; 2 1</span> <span class='c'>#&gt; 3 3</span></code></pre> </div> <p>With vctrs, we now ensure that the replacement value is always cast to the type of the column you are replacing in. This ensures that the column types remain the same before and after you replace any missing values.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># New behavior with vctrs:</span> <span class='c'># Error, because "missing" can't be converted to an integer</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/replace_na.html'>replace_na</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='s'>"missing"</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; Error: Can't convert `replace$x` &lt;character&gt; to match type of `data$x` &lt;integer&gt;.</span> <span class='c'># Integer column type is retained, and the double value of `1` is</span> <span class='c'># converted to an integer replacement value of `1L`</span> <span class='nv'>df</span> <span class='o'><a href='https://tidyr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/replace_na.html'>replace_na</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>1</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #555555;'># A tibble: 3 × 1</span></span> <span class='c'>#&gt; x</span> <span class='c'>#&gt; <span style='color: #555555; font-style: italic;'>&lt;int&gt;</span></span> <span class='c'>#&gt; <span style='color: #555555;'>1</span> 1</span> <span class='c'>#&gt; <span style='color: #555555;'>2</span> 1</span> <span class='c'>#&gt; <span style='color: #555555;'>3</span> 3</span></code></pre> </div> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Thanks to the 25 people who contributed to this version of tidyr by discussing ideas and suggesting new features! <a href="https://github.com/aliaamiri" target="_blank" rel="noopener">@aliaamiri</a>, <a href="https://github.com/allenbaron" target="_blank" rel="noopener">@allenbaron</a>, <a href="https://github.com/bersbersbers" target="_blank" rel="noopener">@bersbersbers</a>, <a href="https://github.com/cjburgess" target="_blank" rel="noopener">@cjburgess</a>, <a href="https://github.com/DanChaltiel" target="_blank" rel="noopener">@DanChaltiel</a>, <a href="https://github.com/edzer" target="_blank" rel="noopener">@edzer</a>, <a href="https://github.com/eshom" target="_blank" rel="noopener">@eshom</a>, <a href="https://github.com/gaborcsardi" target="_blank" rel="noopener">@gaborcsardi</a>, <a href="https://github.com/gergness" target="_blank" rel="noopener">@gergness</a>, <a href="https://github.com/ggrothendieck" target="_blank" rel="noopener">@ggrothendieck</a>, <a href="https://github.com/iago-pssjd" target="_blank" rel="noopener">@iago-pssjd</a>, <a href="https://github.com/issactoast" target="_blank" rel="noopener">@issactoast</a>, <a href="https://github.com/joiharalds" target="_blank" rel="noopener">@joiharalds</a>, <a href="https://github.com/LuiNov" target="_blank" rel="noopener">@LuiNov</a>, <a href="https://github.com/LukasWallrich" target="_blank" rel="noopener">@LukasWallrich</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">@mgirlich</a>, <a href="https://github.com/MichaelChirico" target="_blank" rel="noopener">@MichaelChirico</a>, <a href="https://github.com/NFA" target="_blank" rel="noopener">@NFA</a>, <a href="https://github.com/olehost" target="_blank" rel="noopener">@olehost</a>, <a href="https://github.com/psads-git" target="_blank" rel="noopener">@psads-git</a>, <a href="https://github.com/psychelzh" target="_blank" rel="noopener">@psychelzh</a>, <a href="https://github.com/ramiromagno" target="_blank" rel="noopener">@ramiromagno</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, <a href="https://github.com/TimTaylor" target="_blank" rel="noopener">@TimTaylor</a>, and <a href="https://github.com/xiangpin" target="_blank" rel="noopener">@xiangpin</a>.</p> New error style coming up in rlang 1.0.0 https://www.tidyverse.org/blog/2021/12/rlang-1-0-0-errors/ Wed, 22 Dec 2021 00:00:00 +0000 https://www.tidyverse.org/blog/2021/12/rlang-1-0-0-errors/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [/] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p> <a href="https://rlang.r-lib.org/" target="_blank" rel="noopener">rlang</a> 1.0.0 is getting ready for release and we&rsquo;d like to get your feedback on the new style of error messages featured in this release.</p> <p>The rlang package provides several low-level frameworks, like tidy evaluation, for the tidyverse. The 1.0.0 release focuses on one of these frameworks, <strong>rlang errors</strong>. This set of tools to signal and display errors gets a substantial overhaul. The three main changes to rlang errors that we&rsquo;ll review in this blog post are:</p> <ol> <li>Fully committing to the display of errors as bulleted lists</li> <li>Including the erroring function call by default, as in base R</li> <li>Embracing chained errors to represent contextual information</li> </ol> <p>Attach these packages to follow the examples in the blog post:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://rlang.r-lib.org'>rlang</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span><span class='o'>)</span></code></pre> </div> <p>Here is how a typical rlang error looked before:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>add1</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='m'>1</span> <span class='o'>+</span> <span class='nv'>x</span> <span class='nv'>mtcars</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>new <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; Error: Problem with `mutate()` column `new`.</span> <span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> `new = add1("foo")`.</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> non-numeric argument to binary operator</span> <span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> The error occurred in group 1: cyl = 4.</span></code></pre> </div> <p>And here is how the same error looks with the next versions of rlang and dplyr:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>mtcars</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>new <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in </span><span style='font-weight: bold; font-weight: 100;'>`mutate()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while computing `new = add1("foo")`.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> The error occurred in group 1: cyl = 4.</span> <span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in </span><span style='font-weight: bold; font-weight: 100;'>`1 + x`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> non-numeric argument to binary operator</span></code></pre> </div> <p>For RStudio users, another change is that the error message no longer appears in red but instead uses terminal colours and boldness to style the different parts of the error message.</p> <p>If you&rsquo;d like to try the new error style on your computer, install the development versions of rlang and dplyr (the latter needs to be adapted to the new error style) from github with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'>pak</span><span class='nf'>::</span><span class='nf'><a href='http://pak.r-lib.org/reference/pkg_install.html'>pkg_install</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"r-lib/rlang"</span>, <span class='s'>"tidyverse/dplyr"</span><span class='o'>)</span><span class='o'>)</span></code></pre> </div> <h2 id="displaying-errors-as-bullet-lists">Displaying errors as bullet lists <a href="#displaying-errors-as-bullet-lists"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p> <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>rlang::abort()</code></a> makes it easy to structure error messages as a <strong>bullet list</strong>. We believe that errors should be both informative about the context of the error and easy to skim. A bullet list arrangement that lays out important pieces of information line by line provides the right trade off for this:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rlang.r-lib.org/reference/abort.html'>abort</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span> <span class='s'>"This is the error message."</span>, <span class='s'>"*"</span> <span class='o'>=</span> <span class='s'>"This is a bullet."</span>, <span class='s'>"*"</span> <span class='o'>=</span> <span class='s'>"This is another bullet."</span> <span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> This is the error message.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> This is a bullet.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>•</span> This is another bullet.</span></code></pre> </div> <p>The bullet symbol can be customised to provide a clue about the kind of information contained in the bullet. Use &ldquo;ℹ&rdquo; bullets to provide contextual information or hints, and &ldquo;✖&rdquo; bullets to state a problematic input or state.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rlang.r-lib.org/reference/abort.html'>abort</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span> <span class='s'>"This is the error message."</span>, <span class='s'>"x"</span> <span class='o'>=</span> <span class='s'>"Can't do this."</span>, <span class='s'>"i"</span> <span class='o'>=</span> <span class='s'>"You could do that instead."</span> <span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'>:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> This is the error message.</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> Can't do this.</span> <span class='c'>#&gt; <span style='color: #0000BB;'>ℹ</span> You could do that instead.</span></code></pre> </div> <p>Here is a dplyr example of an informative error message structured as a bullet list:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>mtcars</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span>new <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/rep.html'>rep</a></span><span class='o'>(</span><span class='nv'>am</span>, <span class='m'>2</span><span class='o'>)</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in </span><span style='font-weight: bold; font-weight: 100;'>`mutate()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while computing `new = rep(am, 2)`.</span> <span class='c'>#&gt; <span style='color: #BB0000;'>✖</span> `new` must be size 11 or 1, not 22.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> The error occurred in group 1: cyl = 4.</span></code></pre> </div> <p>While rlang has featured error bullets for a while already, the 1.0.0 version fully commits to that format. The main error message (the error header in rlang terms) has become a bullet with a leading &ldquo;!&rdquo; sign that makes it easy to skim for error headers in a long R output.</p> <h2 id="displaying-the-erroring-function">Displaying the erroring function <a href="#displaying-the-erroring-function"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>By default, <a href="https://rdrr.io/r/base/stop.html" target="_blank" rel="noopener"><code>base::stop()</code></a> shows the function in which it was called:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>add1</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='kr'>if</span> <span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>is.numeric</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"`x` must be numeric."</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span> <span class='o'>&#125;</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span> <span class='c'>#&gt; Error in add1("foo"): `x` must be numeric.</span></code></pre> </div> <p>In rlang, we initially decided to turn off that feature because quite often the erroring function is unrelated to the function called by the user. This happens for instance when <a href="https://rdrr.io/r/base/stop.html" target="_blank" rel="noopener"><code>stop()</code></a> or <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> are called from a helper function:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>add1</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'>check_numeric</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='nv'>x</span> <span class='o'>+</span> <span class='m'>1</span> <span class='o'>&#125;</span> <span class='nv'>check_numeric</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='kr'>if</span> <span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>is.numeric</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='kr'><a href='https://rdrr.io/r/base/stop.html'>stop</a></span><span class='o'>(</span><span class='s'>"`x` must be numeric."</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='o'>&#125;</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span> <span class='c'>#&gt; Error in check_numeric(x): `x` must be numeric.</span></code></pre> </div> <p>To avoid distracting users with irrelevant information, <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> just didn&rsquo;t include a call in the error. However, we were missing out on contextual information that could help users understand the origin of an error without having to look at the backtrace, and that context is particularly important in a long pipeline of function calls.</p> <p>To improve on the situation, we added a <code>call</code> argument to <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> that makes it easy to throw an error on the behalf of another function. If you call <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> from a helper function, pass the caller environment to automatically pick up the corresponding function call:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>check_numeric</span> <span class='o'>&lt;-</span> <span class='kr'>function</span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='kr'>if</span> <span class='o'>(</span><span class='o'>!</span><span class='nf'><a href='https://rdrr.io/r/base/numeric.html'>is.numeric</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#123;</span> <span class='nf'><a href='https://rlang.r-lib.org/reference/abort.html'>abort</a></span><span class='o'>(</span><span class='s'>"`x` must be numeric."</span>, call <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/sys.parent.html'>parent.frame</a></span><span class='o'>(</span><span class='o'>)</span><span class='o'>)</span> <span class='o'>&#125;</span> <span class='o'>&#125;</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in </span><span style='font-weight: bold; font-weight: 100;'>`add1()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `x` must be numeric.</span></code></pre> </div> <p>We have started to adapt our packages to pass the correct function call to <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> but there is still a lot of work to do on that front. If you find a function call that looks off in an error message, please let us know by filing an issue.</p> <h2 id="chained-errors">Chained errors <a href="#chained-errors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Chained errors are another important feature of rlang 1.0. This feature was somewhat hidden in previous versions because it only impacted the appearance of backtraces. In this release, we have decided to show the whole chain of messages to the user, making error chaining much more useful.</p> <p>One important use case for chaining errors is as a scaffholding for displaying contextual information when the user provides computations nested in a particular step, such as a dplyr verb or a ggplot geom.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>mtcars</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/group_by.html'>group_by</a></span><span class='o'>(</span><span class='nv'>cyl</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/mutate.html'>mutate</a></span><span class='o'>(</span> out1 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='nv'>am</span><span class='o'>)</span>, out2 <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/stats/add1.html'>add1</a></span><span class='o'>(</span><span class='s'>"foo"</span><span class='o'>)</span> <span class='o'>)</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in </span><span style='font-weight: bold; font-weight: 100;'>`mutate()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> Problem while computing `out2 = add1("foo")`.</span> <span class='c'>#&gt; <span style='color: #00BBBB;'>ℹ</span> The error occurred in group 1: cyl = 4.</span> <span class='c'>#&gt; <span style='font-weight: bold;'>Caused by error in </span><span style='font-weight: bold; font-weight: 100;'>`add1()`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> `x` must be numeric.</span></code></pre> </div> <p>In this example, dplyr combines all three features (bullet lists, the display of erroring functions, and chained errors) to structure the error message in a hierarchy. At the topmost level, the <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> error provides information about the current expression being evaluated, as well as the current group. The chained error then displays the function that errored within <a href="https://dplyr.tidyverse.org/reference/mutate.html" target="_blank" rel="noopener"><code>mutate()</code></a> as well as the full error message.</p> <p>Currenty only the development version of dplyr takes advantage of chained errors. We hope to implement them in other tidyverse and tidymodels packages in the coming year to make it easier to detect failing steps in large pipelines.</p> <h2 id="use-rlang-style-errors-globally">Use rlang style errors globally <a href="#use-rlang-style-errors-globally"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Normally, only the errors thrown with <a href="https://rlang.r-lib.org/reference/abort.html" target="_blank" rel="noopener"><code>abort()</code></a> use the new display. Add a call to <code>global_handle()</code> in your <code>.Rprofile</code> to use the rlang style globally, including base errors.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='c'># In .Rprofile</span> <span class='nf'>rlang</span><span class='nf'>::</span><span class='nf'>global_handle</span><span class='o'>(</span><span class='o'>)</span></code></pre> </div> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'> <span class='m'>1</span> <span class='o'>+</span> <span class='s'>"foo"</span> <span class='c'>#&gt; <span style='color: #BBBB00; font-weight: bold;'>Error</span><span style='font-weight: bold;'> in </span><span style='font-weight: bold; font-weight: 100;'>`1 + "foo"`:</span></span> <span class='c'>#&gt; <span style='color: #BBBB00;'>!</span> non-numeric argument to binary operator</span> </code></pre> </div> <h2 id="taking-feedback">Taking feedback <a href="#taking-feedback"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Given the scope of the changes, we felt it appropriate to delay the release of rlang 1.0 until late January to get more feedback on the new display of errors. Please reach out on twitter (my handle is <a href="https://twitter.com/_lionelhenry/" target="_blank" rel="noopener">_lionelhenry</a>) or file an <a href="https://github.com/r-lib/rlang" target="_blank" rel="noopener">issue on github</a> if you have any comments.</p> Closing out our year with a Q4 2021 tidymodels update https://www.tidyverse.org/blog/2021/12/tidymodels-2021-q4/ Thu, 16 Dec 2021 00:00:00 +0000 https://www.tidyverse.org/blog/2021/12/tidymodels-2021-q4/ <!-- TODO: * [ ] Look over / edit the post's title in the yaml * [ ] Edit (or delete) the description; note this appears in the Twitter card * [ ] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) * [ ] Find photo & update yaml metadata * [ ] Create `thumbnail-sq.jpg`; height and width should be equal * [ ] Create `thumbnail-wd.jpg`; width should be >5x height * [ ] `hugodown::use_tidy_thumbnails()` * [ ] Add intro sentence, e.g. the standard tagline for the package * [ ] `usethis::use_tidy_thanks()` --> <p>The <a href="https://www.tidymodels.org/" target="_blank" rel="noopener">tidymodels</a> framework is a collection of R packages for modeling and machine learning using tidyverse principles.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">tidymodels</span><span class="p">)</span> <span class="c1">#&gt; ── Attaching packages ──────────────────────────── tidymodels 0.1.4 ──</span> <span class="c1">#&gt; ✓ broom 0.7.10 ✓ rsample 0.1.1 </span> <span class="c1">#&gt; ✓ dials 0.0.10 ✓ tibble 3.1.6 </span> <span class="c1">#&gt; ✓ dplyr 1.0.7 ✓ tidyr 1.1.4 </span> <span class="c1">#&gt; ✓ infer 1.0.0 ✓ tune 0.1.6 </span> <span class="c1">#&gt; ✓ modeldata 0.1.1 ✓ workflows 0.2.4 </span> <span class="c1">#&gt; ✓ parsnip 0.1.7 ✓ workflowsets 0.1.0 </span> <span class="c1">#&gt; ✓ purrr 0.3.4 ✓ yardstick 0.0.9 </span> <span class="c1">#&gt; ✓ recipes 0.1.17</span> <span class="c1">#&gt; ── Conflicts ─────────────────────────────── tidymodels_conflicts() ──</span> <span class="c1">#&gt; x purrr::discard() masks scales::discard()</span> <span class="c1">#&gt; x dplyr::filter() masks stats::filter()</span> <span class="c1">#&gt; x dplyr::lag() masks stats::lag()</span> <span class="c1">#&gt; x recipes::step() masks stats::step()</span> <span class="c1">#&gt; • Dig deeper into tidy modeling with R at https://www.tmwr.org</span> </code></pre></div><p>Starting at the beginning of this year, we now publish <a href="https://www.tidyverse.org/categories/roundup/" target="_blank" rel="noopener">regular updates</a> here on the tidyverse blog summarizing what&rsquo;s new in the tidymodels ecosystem. You can check out the <a href="https://www.tidyverse.org/tags/tidymodels/" target="_blank" rel="noopener"><code>tidymodels</code> tag</a> to find all tidymodels blog posts here, including our roundup posts as well as those that are more focused. The purpose of these quarterly posts is to share useful new features and any updates you may have missed.</p> <p>Since <a href="https://www.tidyverse.org/blog/2021/09/tidymodels-2021-q3/" target="_blank" rel="noopener">our last roundup post</a>, there have been seven CRAN releases of tidymodels packages. You can install these updates from CRAN with:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">install.packages</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s">&#34;broom&#34;</span><span class="p">,</span> <span class="s">&#34;embed&#34;</span><span class="p">,</span> <span class="s">&#34;rsample&#34;</span><span class="p">,</span> <span class="s">&#34;shinymodels&#34;</span><span class="p">,</span> <span class="s">&#34;tidymodels&#34;</span><span class="p">,</span> <span class="s">&#34;workflows&#34;</span><span class="p">,</span> <span class="s">&#34;yardstick&#34;</span><span class="p">))</span> </code></pre></div><p>The <code>NEWS</code> files are linked here for each package; you&rsquo;ll notice that some of these releases involve small bug fixes or internal changes that are not user-facing. We write code in these smaller, modular packages that we can release frequently both to make models easier to deploy and to keep our software easier to maintain. We know it may feel like a lot of moving parts to keep up with if you are a tidymodels user, so we want to highlight a couple of the more useful updates in these releases.</p> <ul> <li> <a href="https://broom.tidymodels.org/news/index.html#broom-0-7-10-2021-10-31" target="_blank" rel="noopener">broom</a></li> <li> <a href="https://embed.tidymodels.org/news/index.html#embed-015" target="_blank" rel="noopener">embed</a></li> <li> <a href="https://rsample.tidymodels.org/news/index.html#rsample-011" target="_blank" rel="noopener">rsample</a></li> <li> <a href="https://shinymodels.tidymodels.org/news/index.html#shinymodels-010" target="_blank" rel="noopener">shinymodels</a></li> <li>the <a href="https://tidymodels.tidymodels.org/news/index.html#tidymodels-0-1-4-2021-10-01" target="_blank" rel="noopener">tidymodels</a> metapackage itself</li> <li> <a href="https://workflows.tidymodels.org/news/index.html#workflows-0-2-4-2021-10-12" target="_blank" rel="noopener">workflows</a></li> <li> <a href="https://yardstick.tidymodels.org/news/index.html#yardstick-0-0-9-2021-11-22" target="_blank" rel="noopener">yardstick</a></li> </ul> <h2 id="tools-for-tidymodels-analyses">Tools for tidymodels analyses <a href="#tools-for-tidymodels-analyses"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>Several of these releases incorporate tools to reduce the overhead for getting started with your tidymodels analysis or for understanding your results more deeply. The new release of the tidymodels metapackage itself provides an R Markdown template. To use the tidymodels analysis template from RStudio, access through <code>File -&gt; New File -&gt; R Markdown</code>. This will open the dialog box where you can select from one of the available templates:</p> <p><img src="figure/tidymodels-template.png" title="R Markdown template dialog box with three choices, including the tidymodels Model Analysis template" alt="R Markdown template dialog box with three choices, including the tidymodels Model Analysis template" width="70%" /></p> <p>If you are not using RStudio, you&rsquo;ll also need to install <a href="https://pandoc.org" target="_blank" rel="noopener">Pandoc</a>. Then, use the <code>rmarkdown::draft()</code> function to create the model card:</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="n">rmarkdown</span><span class="o">::</span><span class="nf">draft</span><span class="p">(</span> <span class="s">&#34;my_model_analysis.Rmd&#34;</span><span class="p">,</span> <span class="n">template</span> <span class="o">=</span> <span class="s">&#34;model-analysis&#34;</span><span class="p">,</span> <span class="n">package</span> <span class="o">=</span> <span class="s">&#34;tidymodels&#34;</span> <span class="p">)</span> </code></pre></div><p>This template offers an opinionated guide on how to structure a basic modeling analysis from exploratory data analysis through evaluating your models. Your individual modeling analysis may require you to add to, subtract from, or otherwise change this structure, but you can consider this a general framework to start from.</p> <p>This quarter, the package <a href="https://shinymodels.tidymodels.org/" target="_blank" rel="noopener">shinymodels</a> had its first CRAN release. This package was the focus of our tidymodels summer intern <a href="https://www.shishamad.com/posts/how-i-got-my-rstudio-internship/" target="_blank" rel="noopener">Shisham Adhikari</a> in 2021, and it provides support for launching a Shiny app to interactively explore tuning or resampling results.</p> <p><img src="https://raw.githubusercontent.com/tidymodels/shinymodels/main/man/figures/example.png" alt="Screenshot of shinymodels app exploring a regression model"></p> <h2 id="make-your-own-rsample-split-objects">Make your own rsample split objects <a href="#make-your-own-rsample-split-objects"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The data resampling infrastructure provided by the <a href="https://rsample.tidymodels.org/" target="_blank" rel="noopener">rsample</a> package has always worked well when you start off with a dataset to split into training and testing. However, we heard from users that in some situations they have training and testing sets determined by other processes, or need to create their data split using more complex conditions. The latest release of rsample provides more fluent and flexible support for custom <code>rsplit</code> creation that sets you up for the rest of your tidymodels analysis. For example, you can create a split object from two dataframes.</p> <div class="highlight"><pre class="chroma"><code class="language-r" data-lang="r"><span class="nf">library</span><span class="p">(</span><span class="n">gapminder</span><span class="p">)</span> <span class="n">year_split</span> <span class="o">&lt;-</span> <span class="nf">make_splits</span><span class="p">(</span> <span class="n">gapminder</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">year</span> <span class="o">&lt;=</span> <span class="m">2000</span><span class="p">),</span> <span class="n">gapminder</span> <span class="o">%&gt;%</span> <span class="nf">filter</span><span class="p">(</span><span class="n">year</span> <span class="o">&gt;</span> <span class="m">2000</span><span class="p">)</span> <span class="p">)</span> <span class="n">year_split</span> <span class="c1">#&gt; &lt;Analysis/Assess/Total&gt;</span> <span class="c1">#&gt; &lt;1420/284/1704&gt;</span> <span class="nf">testing</span><span class="p">(</span><span class="n">year_split</span><span class="p">)</span> <span class="c1">#&gt; # A tibble: 284 × 6</span> <span class="c1">#&gt; country continent year lifeExp pop gdpPercap</span> <span class="c1">#&gt; &lt;fct&gt; &lt;fct&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;</span> <span class="c1">#&gt; 1 Afghanistan Asia 2002 42.1 25268405 727.</span> <span class="c1">#&gt; 2 Afghanistan Asia 2007 43.8 31889923 975.</span> <span class="c1">#&gt; 3 Albania Europe 2002 75.7 3508512 4604.</span> <span class="c1">#&gt; 4 Albania Europe 2007 76.4 3600523 5937.</span> <span class="c1">#&gt; 5 Algeria Africa 2002 71.0 31287142 5288.</span> <span class="c1">#&gt; 6 Algeria Africa 2007 72.3 33333216 6223.</span> <span class="c1">#&gt; 7 Angola Africa 2002 41.0 10866106 2773.</span> <span class="c1">#&gt; 8 Angola Africa 2007 42.7 12420476 4797.</span> <span class="c1">#&gt; 9 Argentina Americas 2002 74.3 38331121 8798.</span> <span class="c1">#&gt; 10 Argentina Americas 2007 75.3 40301927 12779.</span> <span class="c1">#&gt; # … with 274 more rows</span> </code></pre></div><p>You can alternatively <a href="https://rsample.tidymodels.org/reference/make_splits.html" target="_blank" rel="noopener">create a split using a list of indices</a>; this <code>make_splits()</code> flexibility is good for when the defaults in <code>initial_split()</code> and <code>initial_time_split()</code> are not appropriate. We also added a <a href="https://rsample.tidymodels.org/reference/validation_split.html" target="_blank" rel="noopener">new function <code>validation_time_split()</code></a> to create a single validation resample, much like <code>validation_split()</code>, but taking the first <code>prop</code> samples for training.</p> <h2 id="survey-says">Survey says&hellip; <a href="#survey-says"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>This fall, we <a href="https://www.tidyverse.org/blog/2021/10/tidymodels-2022-survey/" target="_blank" rel="noopener">launched our second tidymodels survey</a> to gather community input on our priorities for 2022. Thank you so much to everyone who shared their opinion! Over 600 people completed the survey, a significant increase from last year, and the top three requested features overall are:</p> <ul> <li> <p><strong>Supervised feature selection:</strong> This includes basic supervised filtering methods as well as techniques such as recursive feature elimination.</p> </li> <li> <p><strong>Model fairness analysis and metrics:</strong> Techniques to measure if there are biases in model predictions that treat groups or individuals unfairly.</p> </li> <li> <p><strong>Post modeling probability calibration:</strong> Methods to characterize (and correct) probability predictions to make sure that probability estimates reflect the observed event rate(s).</p> </li> </ul> <p>You can also <a href="https://colorado.rstudio.com/rsc/tidymodels-priorities-2022/" target="_blank" rel="noopener">check out our full analysis of the survey results</a>.</p> <p><img src="figure/survey-results.png" alt="Bar chart broken out by role showing that all groups (students, academics, industry, hobbyists) rate supervised feature selection highest"></p> <h2 id="acknowledgements">Acknowledgements <a href="#acknowledgements"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>We’d like to extend our thanks to all of the contributors who helped make these releases during Q4 possible!</p> <ul> <li> <p>broom: <a href="https://github.com/gjones1219" target="_blank" rel="noopener">@gjones1219</a>, <a href="https://github.com/gravesti" target="_blank" rel="noopener">@gravesti</a>, <a href="https://github.com/gregmacfarlane" target="_blank" rel="noopener">@gregmacfarlane</a>, <a href="https://github.com/ilapros" target="_blank" rel="noopener">@ilapros</a>, <a href="https://github.com/jamesrrae" target="_blank" rel="noopener">@jamesrrae</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/lcgodoy" target="_blank" rel="noopener">@lcgodoy</a>, <a href="https://github.com/RobBinS83" target="_blank" rel="noopener">@RobBinS83</a>, <a href="https://github.com/simonpcouch" target="_blank" rel="noopener">@simonpcouch</a>, <a href="https://github.com/statzhero" target="_blank" rel="noopener">@statzhero</a>, <a href="https://github.com/vincentarelbundock" target="_blank" rel="noopener">@vincentarelbundock</a>, and <a href="https://github.com/wviechtb" target="_blank" rel="noopener">@wviechtb</a></p> </li> <li> <p>embed: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jlmelville" target="_blank" rel="noopener">@jlmelville</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a></p> </li> <li> <p>rsample: <a href="https://github.com/EmilHvitfeldt" target="_blank" rel="noopener">@EmilHvitfeldt</a>, <a href="https://github.com/jmgirard" target="_blank" rel="noopener">@jmgirard</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mmp3" target="_blank" rel="noopener">@mmp3</a>, and <a href="https://github.com/Shafi2016" target="_blank" rel="noopener">@Shafi2016</a></p> </li> <li> <p>shinymodels: <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a></p> </li> <li> <p>tidymodels: <a href="https://github.com/agronomofiorentini" target="_blank" rel="noopener">@agronomofiorentini</a>, <a href="https://github.com/AshleyHenry15" target="_blank" rel="noopener">@AshleyHenry15</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a></p> </li> <li> <p>workflows: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/dkgaraujo" target="_blank" rel="noopener">@dkgaraujo</a>, <a href="https://github.com/hfrick" target="_blank" rel="noopener">@hfrick</a>, and <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a></p> </li> <li> <p>yardstick: <a href="https://github.com/DavisVaughan" target="_blank" rel="noopener">@DavisVaughan</a>, <a href="https://github.com/joeycouse" target="_blank" rel="noopener">@joeycouse</a>, <a href="https://github.com/juliasilge" target="_blank" rel="noopener">@juliasilge</a>, <a href="https://github.com/mattwarkentin" target="_blank" rel="noopener">@mattwarkentin</a>, <a href="https://github.com/romainfrancois" target="_blank" rel="noopener">@romainfrancois</a>, and <a href="https://github.com/topepo" target="_blank" rel="noopener">@topepo</a></p> </li> </ul> Re-licensing packages: a retrospective https://www.tidyverse.org/blog/2021/12/relicensing-packages/ Tue, 07 Dec 2021 00:00:00 +0000 https://www.tidyverse.org/blog/2021/12/relicensing-packages/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>The tidyverse (including tidymodels and r-lib) includes packages that have been written over the course of 15 years. Unfortunately this has lead to a diversity of licenses <sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. It is fundamentally important that software has a license, because without it no one knows how they can use it. While our packages already had open source licenses, when we looked at them holistically, we realised that we used a rather large variety of licenses, including MIT, BSD, GPL (versions 2 and 3), and more! While nothing is wrong with any of these licenses individually, the collective variety makes things confusing, particularly for people or organizations who want to use multiple packages together.</p> <p>To reduce this confusion and make it clear that our packages are an unconditional gift that can be freely used without reciprocal obligation, we embarked on a journey to apply the same license to as many of our packages as possible (we couldn&rsquo;t apply the same license to every package because some packages bundled code with incompatible licenses). No license is perfect, but we had to choose one, and after much discussion we decided on <a href="https://spdx.org/licenses/MIT" target="_blank" rel="noopener">the MIT License</a>. The MIT License is short (171 words), widespread, relatively easy to understand (see lawyer/programmer Kyle E. Mitchell&rsquo;s <a href="https://writing.kemitchell.com/2016/09/21/MIT-License-Line-by-Line.html" target="_blank" rel="noopener">&ldquo;The MIT License, Line by Line&rdquo;</a> for details), and very permissive. MIT is not &ldquo;copyleft&rdquo; or &ldquo;hereditary&rdquo;<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup> which require that derivative works<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup> must be licensed under the same license (e.g. GPL) and be subject to (aka &ldquo;inherit&rdquo;) all of its restrictions .</p> <p>Once we decided on the MIT license, we needed to check with all the authors of the code to make sure it was OK to relicense it. This involved several steps. We started by reviewing all commits made by non-RStudio contributors<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup> to each package. We then contacted all (~500) contributors whose changes could constitute a copyrightable contribution (i.e. anything other than minor edits, such as typo fixes) via GitHub, e-mail, or (in a limited number of cases) personal communication requesting their statement of agreement<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup> .</p> <p>You can see an example of the process in <a href="https://github.com/tidyverse/purrr/issues/805" target="_blank" rel="noopener">the re-licensing issue for purrr</a>. The re-licensing generated some discussion but we were grateful to receive unanimous agreement to re-license, thus avoiding the need to re-implement any existing code<sup id="fnref:6"><a href="#fn:6" class="footnote-ref" role="doc-noteref">6</a></sup>.</p> <p>The bulk of our packages are now under MIT, which means they&rsquo;re consistent (yay!), and you can continue to use them as you were before (especially since we didn&rsquo;t think there was any problem using them for any reason under their previous licenses).</p> <p>This blog post has been a long time coming, and (by necessity) gives a reductive summary of nuanced topics. If you&rsquo;d like to learn more about copyright and intellectual-property law as it pertains to open-source software, I recommend the following four books:</p> <ul> <li> <p><strong>Open Source Licensing: Software Freedom and Intellectual Property Law</strong> by Lawrence Rosen (2004). Available from Rosen free online at <a href="http://www.rosenlaw.com/oslbook.htm">http://www.rosenlaw.com/oslbook.htm</a>.</p> </li> <li> <p><strong>Understanding Open Source and Free Software Licensing</strong> by Andrew M. St. Laurent (2004).</p> </li> <li> <p><strong>Intellectual Property and Open Source: A Practical Guide to Protecting Code</strong> by Van Lindberg (2008).</p> </li> <li> <p><strong>The Open Source Alternative: Understanding Risks and Leveraging Opportunities</strong> by Heather J. Meeker (2008).</p> </li> </ul> <p>You can also check out the <a href="https://colorado.rstudio.com/rsc/relicensing-the-notes/the-notes.html" target="_blank" rel="noopener">research notes</a> that I (Mara) made while working on this project.</p> <section class="footnotes" role="doc-endnotes"> <hr> <ol> <li id="fn:1" role="doc-endnote"> <p>A license is an agreement in which a licensee is given permission to use the property by the property holder. The licensee&rsquo;s use is conditional on the grant, scope, and reservation of rights of the granted permission. <a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>Term used in Heather J. Meeker&rsquo;s <em>The Open Source Alternative: Understanding Risks and Leveraging Opportunities</em>, 2008. <a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>What constitutes a &ldquo;derivative work&rdquo; is complicated, nuanced, and beyond the scope of this discussion. <a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>This includes those who were not affiliated with RStudio at the time of their contributions. <a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:5" role="doc-endnote"> <p>&ldquo;Prior art&rdquo; includes the re-licensing of the Bootstrap framework, the details of which are nicely documented in this <a href="https://opensource.stackexchange.com/questions/6097/how-does-bootstrap-v4-mit-deal-with-contributions-made-under-v3-apache-2-0/6099#6099" target="_blank" rel="noopener">StackExchange thread</a>. <a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> <li id="fn:6" role="doc-endnote"> <p>This is permitted under what is known as the &ldquo;idea-expression&rdquo; dichotomy in copyright law, codified in 17 U.S.C. § 102 under which &ldquo;protection is given only to the expression of the idea-not the idea itself&rdquo; <em>Mazer v. Stein</em>, 347 U.S. 201 (1954) at 217. <a href="#fnref:6" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p> </li> </ol> </section> dtplyr 1.2.0 https://www.tidyverse.org/blog/2021/12/dtplyr-1-2-0/ Mon, 06 Dec 2021 00:00:00 +0000 https://www.tidyverse.org/blog/2021/12/dtplyr-1-2-0/ <!-- TODO: * [x] Look over / edit the post's title in the yaml * [x] Edit (or delete) the description; note this appears in the Twitter card * [x] Pick category and tags (see existing with [`hugodown::tidy_show_meta()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html)) * [x] Find photo & update yaml metadata * [x] Create `thumbnail-sq.jpg`; height and width should be equal * [x] Create `thumbnail-wd.jpg`; width should be >5x height * [x] [`hugodown::use_tidy_thumbnails()`](https://rdrr.io/pkg/hugodown/man/use_tidy_post.html) * [x] Add intro sentence, e.g. the standard tagline for the package * [x] [`usethis::use_tidy_thanks()`](https://usethis.r-lib.org/reference/use_tidy_thanks.html) --> <p>We&rsquo;re thrilled to announce that <a href="https://dtplyr.tidyverse.org" target="_blank" rel="noopener">dtplyr</a> 1.2.0 is now on CRAN. dtplyr gives you the speed of <a href="http://r-datatable.com/" target="_blank" rel="noopener">data.table</a> with the syntax of dplyr; you write dplyr (and tidyr) code and dtplyr translates it to the data.table equivalent.</p> <p>You can install dtplyr from CRAN with:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nf'><a href='https://rdrr.io/r/utils/install.packages.html'>install.packages</a></span><span class='o'>(</span><span class='s'>"dtplyr"</span><span class='o'>)</span></code></pre> </div> <p>I&rsquo;ll discuss three major changes in this blog post:</p> <ul> <li>New authors</li> <li>New tidyr translations</li> <li>Improvements to join translations</li> </ul> <p>There are also over 20 minor improvements to the quality of translations; you can see a full list in the <a href="https://github.com/tidyverse/dtplyr/blob/main/NEWS.md" target="_blank" rel="noopener">release notes</a>.</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dtplyr.tidyverse.org'>dtplyr</a></span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://dplyr.tidyverse.org'>dplyr</a></span>, warn.conflicts <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='kr'><a href='https://rdrr.io/r/base/library.html'>library</a></span><span class='o'>(</span><span class='nv'><a href='https://tidyr.tidyverse.org'>tidyr</a></span><span class='o'>)</span></code></pre> </div> <h2 id="new-authors">New authors <a href="#new-authors"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>The biggest news in this release is the addition of three new <a href="https://github.com/tidyverse/tidyups/blob/main/004-governance.md#authors" target="_blank" rel="noopener">authors</a>: <a href="https://github.com/markfairbanks" target="_blank" rel="noopener">Mark Fairbanks</a>, <a href="https://github.com/mgirlich" target="_blank" rel="noopener">Maximilian Girlich</a>, and <a href="https://github.com/eutwt" target="_blank" rel="noopener">Ryan Dickerson</a> are now dtplyr authors in recognition of their significant and sustained contributions. In fact, they implemented the bulk of the improvements in this release!</p> <h2 id="tidyr-translations">tidyr translations <a href="#tidyr-translations"> <svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> <path d="M0 0h24v24H0z" fill="currentColor"></path> <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path> </svg> </a> </h2><p>dtplyr gains translations for many more tidyr verbs including <a href="https://tidyr.tidyverse.org/reference/complete.html" target="_blank" rel="noopener"><code>complete()</code></a>, <a href="https://tidyr.tidyverse.org/reference/drop_na.html" target="_blank" rel="noopener"><code>drop_na()</code></a>, <a href="https://tidyr.tidyverse.org/reference/expand.html" target="_blank" rel="noopener"><code>expand()</code></a>, <a href="https://tidyr.tidyverse.org/reference/fill.html" target="_blank" rel="noopener"><code>fill()</code></a>, <a href="https://tidyr.tidyverse.org/reference/nest.html" target="_blank" rel="noopener"><code>nest()</code></a>, <a href="https://tidyr.tidyverse.org/reference/pivot_longer.html" target="_blank" rel="noopener"><code>pivot_longer()</code></a>, <a href="https://tidyr.tidyverse.org/reference/replace_na.html" target="_blank" rel="noopener"><code>replace_na()</code></a>, and <a href="https://tidyr.tidyverse.org/reference/separate.html" target="_blank" rel="noopener"><code>separate()</code></a>. A few examples are shown below:</p> <div class="highlight"> <pre class='chroma'><code class='language-r' data-lang='r'><span class='nv'>dt</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dtplyr.tidyverse.org/reference/lazy_dt.html'>lazy_dt</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='kc'>NA</span>, <span class='s'>"x.y"</span>, <span class='s'>"x.z"</span>, <span class='s'>"y.z"</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>dt</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/separate.html'>separate</a></span><span class='o'>(</span><span class='nv'>x</span>, <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='s'>"A"</span>, <span class='s'>"B"</span><span class='o'>)</span>, sep <span class='o'>=</span> <span class='s'>"\\."</span>, remove <span class='o'>=</span> <span class='kc'>FALSE</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'>#&gt; copy(`_DT1`)[, `:=`(c("A", "B"), tstrsplit(x, split = "\\."))]</span> <span class='nv'>dt</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dtplyr.tidyverse.org/reference/lazy_dt.html'>lazy_dt</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/data.frame.html'>data.frame</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='nf'><a href='https://rdrr.io/r/base/c.html'>c</a></span><span class='o'>(</span><span class='m'>1</span>, <span class='kc'>NA</span>, <span class='kc'>NA</span>, <span class='m'>2</span>, <span class='kc'>NA</span><span class='o'>)</span><span class='o'>)</span><span class='o'>)</span> <span class='nv'>dt</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/fill.html'>fill</a></span><span class='o'>(</span><span class='nv'>x</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'>#&gt; copy(`_DT2`)[, `:=`(x = nafill(x, "locf"))]</span> <span class='nv'>dt</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://tidyr.tidyverse.org/reference/replace_na.html'>replace_na</a></span><span class='o'>(</span><span class='nf'><a href='https://rdrr.io/r/base/list.html'>list</a></span><span class='o'>(</span>x <span class='o'>=</span> <span class='m'>99</span><span class='o'>)</span><span class='o'>)</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf'><a href='https://dplyr.tidyverse.org/reference/explain.html'>show_query</a></span><span class='o'>(</span><span class='o'>)</span> <span class='c'>#&gt; copy(`_DT2`)[, `:=`(x = fcoalesce(x, 99))]</span> <span class='nv'>dt</span> <span class='o'>&lt;-</span> <span class='nf'><a href='https://dtplyr.tidyverse.org/reference/lazy_dt.html'>lazy_dt</a></span><span class='o'>(</span><span class='nv'>relig_income</span><span class='o'>)</span> <span class='nv'>dt</span> <span class='o'><a href='https://magrittr.tidyverse.org/reference/pipe.html'>%&gt;%</a></span> <span class='nf