S is for summarise
This article is originally published at http://www.deeplytrivial.com/Today, we'll finally talk about summarise! It's very similar to mutate, but instead of adding or altering a variable in a dataset, it aggregates your data, creating a new tibble with the columns containing your requested summary data. The number of rows will be equal to the number of groups from group_by (if you don't specify any groups, your tibble will have one row that summarizes your entire dataset).
These days, when I want descriptive statistics from a dataset, I generally use summarise, because I can specify the exact statistics I want in the exact order I want (for easy pasting of tables into a report or presentation).
Also, if you're not a fan of the UK spelling, summarize works exactly the same. The same is true of other R/tidyverse functions, like color versus colour.
Let's load the reads2019 dataset and start summarizing!
library(tidyverse)
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allrated.csv",
col_names = TRUE)
reads2019 %>%
summarise(AllPages = sum(Pages),
AvgLength = mean(Pages),
AvgRating = mean(MyRating),
AvgReadTime = mean(read_time),
ShortRT = min(read_time),
LongRT = max(read_time),
TotalAuthors = n_distinct(Author))
## # A tibble: 1 x 7
## AllPages AvgLength AvgRating AvgReadTime ShortRT LongRT TotalAuthors
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 29696 341. 4.14 3.92 0 25 42
reads2019 %>%
filter(is.na(OriginalPublicationYear)) %>%
select(Title)
## # A tibble: 5 x 1
## Title
## <chr>
## 1 Empath: A Complete Guide for Developing Your Gift and Finding Your Sense of S…
## 2 Perilous Pottery (Cozy Corgi Mysteries, #11)
## 3 Precarious Pasta (Cozy Corgi Mysteries, #14)
## 4 Summerdale
## 5 Swarm Theory
reads2019 <- reads2019 %>%
mutate(OriginalPublicationYear = replace(OriginalPublicationYear,
Title == "Empath: A Complete Guide for Developing Your Gift and Finding Your Sense of Self", 2017),
OriginalPublicationYear = replace(OriginalPublicationYear,
Title == "Summerdale", 2018),
OriginalPublicationYear = replace(OriginalPublicationYear,
Title == "Swarm Theory", 2016),
OriginalPublicationYear = replace_na(OriginalPublicationYear, 2019))
genrestats <- reads2019 %>%
filter(Fiction == 1) %>%
arrange(OriginalPublicationYear) %>%
group_by(Childrens, Fantasy, SciFi, Mystery) %>%
summarise(Books = n(),
WomenAuthors = sum(Gender),
AvgLength = mean(Pages),
AvgRating = mean(MyRating),
NewestBook = last(OriginalPublicationYear),
OldestBook = first(OriginalPublicationYear))
genrestats <- genrestats %>%
bind_cols(Genre = c("General Fiction",
"Mystery",
"Science Fiction",
"Fantasy",
"Fantasy SciFi",
"Children's Fiction",
"Children's Fantasy")) %>%
ungroup() %>%
select(Genre, everything(), -Childrens, -Fantasy, -SciFi, -Mystery)
library(expss)
as.etable(genrestats, rownames_as_row_labels = NULL)
Genre | Books | WomenAuthors | AvgLength | AvgRating | NewestBook | OldestBook |
---|---|---|---|---|---|---|
General Fiction | 15 | 10 | 320.1 | 4.1 | 2019 | 1941 |
Mystery | 9 | 8 | 316.3 | 3.8 | 2019 | 1950 |
Science Fiction | 19 | 4 | 361.4 | 4.4 | 2019 | 1959 |
Fantasy | 19 | 3 | 426.3 | 4.2 | 2019 | 1981 |
Fantasy SciFi | 2 | 0 | 687.0 | 4.5 | 2009 | 2006 |
Children's Fiction | 1 | 0 | 181.0 | 4.0 | 2016 | 2016 |
Children's Fantasy | 16 | 1 | 250.6 | 4.2 | 2008 | 1900 |
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.3
reads2019 %>%
mutate(Gender = factor(Gender, levels = c(0,1),
labels = c("Male",
"Female")),
Fiction = factor(Fiction, levels = c(0,1),
labels = c("Non-Fiction",
"Fiction"),
ordered = TRUE)) %>%
group_by(Gender, Fiction) %>%
summarise(Books = n()) %>%
ggplot(aes(Fiction, Books)) +
geom_col(aes(fill = reorder(Gender, desc(Gender)))) +
scale_fill_economist() +
xlab("Genre") +
labs(fill = "Author Gender")
Thanks for visiting r-craft.org
This article is originally published at http://www.deeplytrivial.com/
Please visit source website for post related comments.