ggplot2 / R / R News / Tidyverse

Ordering Categories within ggplot2 Facets

by tylerrinker · December 23, 2016

This article is originally published at https://trinkerrstuff.wordpress.com

I saw Simon Jackson’s recent blog post regarding ordering categories within facets. He proposed a way of dealing with the problem of ordering variables shared across facets within facets. This problem becomes apparent in text analysis where words are shared across facets but differ in frequency/magnitude ordering within each facet. Julia Silge and David Robinson note that this is a particularly vexing problem in a TODO comment in their tidy text book:

## TODO: make the ordering vary depending on each facet ## (not easy to fix)

Simon has provided a working approach but it feels awkward in that you are converting factors to numbers visually, adjusting spacing, and then putting the labels back on. I believe that there is a fairly straight forward tidy approach to deal with this problem.

Terminology

Definitions of the terms I use to describe the solution:

category variable – the bar categories variable (terms in this case)
count variable – the bar heights variable
facet variable – the meta grouping used for faceting

Logic

I have dealt with the ordering within facets problem using this logic:

Order the data rows by grouping on the facet variable and the categories variable and arrange-ing on the count variable in a descending* fashion
Ungroup
Remake the categories variable by appending the facet label(s) as a deliminated suffix to the categories variable (make sure this is a factor with the levels reversed) [this maintains the ordering by making shared categories unique]
Plot as usual**
Remove the suffix you added previously using scale_x_discrete

*This allows you to take a slice of the top n terms if desired
**I prefer the ggstance geom_barh to the ggplot2 geom_bar + coord_flip as the former lets me set y as the terms variable and the later doesn’t always play nicely with scales being set free

This approach adds an additional 5 lines of code (in the code below I number them at as comment integers) and is, IMO, pretty easy to reason about. Here’s the additional lines of code:

group_by(word1, word2) %>%                  
arrange(desc(contribution)) %>%                
ungroup() %>%
mutate(word2 = factor(paste(word2, word1, sep = "__"), levels = rev(paste(word2, word1, sep = "__")))) %>% 
    # --ggplot here--
    scale_x_discrete(labels = function(x) gsub("__.+$", "", x))

Example

Let’s go ahead and use Julia and David’s example to demonstrate the technique:

# Required libraries
p_load(tidyverse, tidytext, janeaustenr)

# From section 5.1: Tokenizing by n-gram
austen_bigrams <- austen_books() %>%
    unnest_tokens(bigram, text, token = "ngrams", n = 2)

# From section 5.1.1: Counting and filtering n-grams
bigrams_separated <- austen_bigrams %>%
    separate(bigram, c("word1", "word2"), sep = " ")

	
# From section 5.1.3: Using bigrams to provide context in sentiment analysis
AFINN <- get_sentiments("afinn")
negation_words <- c("not", "no", "never", "without")

negated_words <- bigrams_separated %>%
    filter(word1 %in% negation_words) %>%
    inner_join(AFINN, by = c(word2 = "word")) %>%
    count(word1, word2, score, sort = TRUE) %>%
    ungroup()


# Create plot
negated_words %>%
    mutate(contribution = n * score) %>%
    mutate(word2 = reorder(word2, contribution)) %>%
    group_by(word1) %>%
    top_n(10, abs(contribution)) %>%
    group_by(word1, word2) %>%                                    #1
    arrange(desc(contribution)) %>%                               #2
    ungroup() %>%                                                 #3
    mutate(word2 = factor(paste(word2, word1, sep = "__"), levels = rev(paste(word2, word1, sep = "__")))) %>% #4
    ggplot(aes(word2, contribution, fill = n * score > 0)) +
        geom_bar(stat = "identity", show.legend = FALSE) +
        facet_wrap(~ word1, scales = "free") +
        xlab("Words preceded by negation") +
        ylab("Sentiment score * # of occurrences") +
        theme_bw() +
        coord_flip() +
        scale_x_discrete(labels = function(x) gsub("__.+$", "", x)) #5

Thanks for visiting r-craft.org
This article is originally published at https://trinkerrstuff.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Ordering Categories within ggplot2 Facets

Terminology

Logic

Example

You may also like...

Sept 2020: “Top 40” New CRAN Packages

2020 at RStudio: A Year in Review

Ensemble Methods in Machine Learning: Bagging & Subagging

Categories