Category: R statistical package

Y is for scale_y

Yesterday, I talked about scale_x. Today, I’ll continue on that topic, focusing on the y-axis.The key to using any of the scale_ functions is to know what sort of data...continue reading.

X is for scale_x

These next two posts will deal with formatting scales in ggplot2 – x-axis, y-axis – so I’ll try to limit the amount of overlap and repetition.Let’s say I wanted to...continue reading.

V is for Verbs

In this series, I’ve covered five terms for data manipulation:arrangefiltermutateselectsummariseThese are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions...continue reading.

U is for Useful Trick

This will be a very short post for a line of code I’ve found unbelievably useful as I analyze data for work. I’m working with datasets containing millions of rows...continue reading.

T is for Themes

One of the easiest ways to make a beautiful ggplot is by using a theme. ggplot2 comes with a variety of pre-existing themes. I’ll use the genre statistics summary table...continue reading.

S is for summarise

Today, we’ll finally talk about summarise! It’s very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble...continue reading.

R is for read_

The tidyverse is full of functions for reading data, beginning with “read_”. The read_csv I’ve used to access my reads2019 data is one example, falling under the read_delim functions. read_tsv...continue reading.

P is for percent

We’ve used ggplots throughout this blog series, but today, I want to introduce another package that helps you customize scales on your ggplots – the scales package. I use this...continue reading.

O is for order_by

This will be a quick post on another tidyverse function, order_by. I’ll admit, I don’t use this one as often as arrange. It can be useful, though, if you don’t...continue reading.

N is for n_distinct

Today, we’ll start digging into some of the functions used to summarise data. The full summarise function will be covered for the letter S. For now, let’s look at one...continue reading.

M is for mutate

Today, we finally talk about the mutate function! I’ve used it a lot throughout the series so far, so it’s nice to get to discuss what it is and how...continue reading.

L is for Log Transformation

When visualizing data, outliers and skewed data can have a huge impact, potentially making your visualization difficult to understand. We can use many of the tricks covered so far to...continue reading.

J is for Join

Today, we’ll start digging into the wonderful world of joins! The tidyverse offers several different types of joins between two datasets, X and Y:left_join – keeps all rows from X...continue reading.