# Sort data frames by columns

This article is originally published at https://www.quantargo.com/blog

To select areas of interest in a data frame they often need to be ordered by specific columns. The **dplyr** `arrange()`

function supports data frame orderings by multiple columns in ascending and descending order.

- Use the
`arrange()`

function to sort data frames. - Sort data frames by multiple columns using
`arrange()`

.

arrange(, ) arrange( , , , ...)

## The arrange() function with a single column

arrange(, ) arrange( , , , ...)

The `arrange()`

function orders the rows of a data frame. It takes a data frame or a tibble as the first parameter and the names of the columns based on which the rows should be ordered as additional parameters. Let’s assume, we want to answer the question: *Which states had the highest percentage of Republican voters in the 2016 US presidential election?* To answer this question, in the following example we use the `pres_results_2016`

data frame, containing information only for the 2016 US presidential election. We `arrange()`

the data frame based on the `rep`

column (Republican votes in percentage):

arrange(pres_results_2016, rep)

# A tibble: 51 x 6 year state total_votes dem rep other1 2016 DC 312575 0.905 0.0407 0.0335 2 2016 HI 437664 0.610 0.294 0.0958 3 2016 VT 320467 0.557 0.298 0.0737 # … with 48 more rows

As you can see in the output, the data frame is sorted in an ascending order based on the `rep`

column. However, we would prefer to have the results in a descending order, so that we can instantly see the `state`

with the highest `rep`

percentage. To sort a column in a descending order, all we need to do is apply the `desc()`

function on the given column inside the `arrange()`

function:

arrange(pres_results_2016, desc(rep))

# A tibble: 51 x 6 year state total_votes dem rep other1 2016 WV 713051 0.265 0.686 0.0489 2 2016 WY 258788 0.216 0.674 0.0830 3 2016 OK 1452992 0.289 0.653 0.0575 # … with 48 more rows

Arranging is not only possible on numeric values, but on character values as well. In that case, **dplyr** sorts the rows in alphabetic order. We can arrange character columns just like numeric ones:

arrange(pres_results_2016, state)

# A tibble: 51 x 6 year state total_votes dem rep other1 2016 AK 318608 0.366 0.513 0.0928 2 2016 AL 2123372 0.344 0.621 0.0254 3 2016 AR 1130635 0.337 0.606 0.0577 # … with 48 more rows

## Exercise: Use arrange() based on a single column

The `gapminder_2007`

dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which country had the lowest life expectancy `lifeExp`

in 2007! The **dplyr** package is already loaded.

- Apply the
`arrange()`

function on the`gapminder_2007`

tibble - Order the tibble based on the
`lifeExp`

column

## Exercise: Use arrange() in combination with desc()

The `gapminder_2007`

dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which countries had the largest population in 2007! The **dplyr** package is already loaded.

- Apply the
`arrange()`

function on the`gapminder_2007`

tibble. - Sort the tibble in a descending order based on the
`pop`

column.

## The arrange() function with multiple columns

We can use the `arrange()`

function on multiple columns as well. In this case the order of the columns in the function parameters, sets a hierarchy of ordering. The function starts by ordering the rows based on the first column defined in the parameters. In case there are several rows with the same value, the function decides the order based on the second column defined in the parameters. If there are still multiple rows with the same values, the function decides based on the third column defined in the parameters (if defined) and so on.

In the following example we use the `pres_results_subset`

data frame, containing election results only for the states: `"TX"`

(Texas),`"UT"`

(Utah) and `"FL"`

(Florida). First we sort the data frame in a descending order based on the `year`

column. Then, we add a second level, and order the data frame based on the `dem`

column:

arrange(pres_results_subset, year, dem)

# A tibble: 33 x 6 year state total_votes dem rep other1 1976 UT 541218 0.336 0.624 0.0392 2 1976 TX 4071884 0.511 0.480 0.00817 3 1976 FL 3150631 0.519 0.466 0.0143 # … with 30 more rows

As you can see in the output, the data frame is overall ordered based on the `year`

column. However, when the value of `year`

is the same, the order of the rows is decided by the `dem`

column.

## Exercise: Use arrange() based on multiple columns

The `gapminder_2007`

tibble contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect for each continent, which countries had the highest life expectancy in 2007! The **dplyr** package is already loaded.

- Apply the
`arrange()`

function on the`gapminder_2007`

tibble. - Order the tibble based on the
`continent`

column! - In case there are rows with the same
`continent`

, sort the tibble in a descending order based on the`lifeExp`

column!

## Quiz: arrange() Function

Which of the following statements are true about the`arrange()`

function?- The
`arrange()`

function orders the rows of a data frame. - To
`arrange()`

the values of column in an ascending order, we need to use the`asc()`

function. - To
`arrange()`

the values of column in a descending order, we need to use the`desc()`

function. - You can only
`arrange()`

a data frame based on one column.

Sort data frames by columns is an excerpt from the course Introduction to R, which is available for free at quantargo.com

Thanks for visiting r-craft.org

This article is originally published at https://www.quantargo.com/blog

Please visit source website for post related comments.