Data Science / Python / R News

How to Use Pandas Append to Combine Rows of Data in Python

by Joshua Ebner · September 14, 2021

This article is originally published at https://www.sharpsightlabs.com

In this tutorial, I’ll explain how to use the Pandas append technique to append new rows to a Pandas dataframe or object.

I’ll explain exactly what the append technique does, how the syntax works, and I’ll show you step-by-step examples.

Table of Contents:

Let’s start with a quick explanation of what the append method does.

A quick introduction to Pandas append

The Pandas append technique appends new rows to a Pandas object. This is a very common technique that we use for data cleaning and data wrangling in Python.

This technique is somewhat flexible, in the sense that we can use it on a couple of different Pandas objects. We can use this technique on:

dataframes
Series

When we use append on dataframes, the dataframes often have the same columns. But if the input dataframes have different columns, then the output dataframe will have the columns of both inputs.

Having said all of that, what this technique does depends on how we use the syntax.

That being the case, let’s look at the syntax and the optional parameters.

The syntax of Pandas append

Here, I’ll explain the syntax for the Pandas append method.

I’ll explain the syntax for both Pandas dataframes, and Pandas Series objects.

A quick note

Before we look at the syntax, keep in mind a few things:

First, these syntax explanations assume that you’ve already imported the Pandas package. You can do that with the following code:

import pandas as pd

Second, these syntax explanations also assume that you already have two Pandas dataframes or other objects that you want to combine together.

For a refresher on dataframes, you can read our blog post on Pandas dataframes.

Dataframe append syntax

Using the append method on a dataframe is very simple.

You type the name of the first dataframe, and then .append() to call the method.

Then inside the parenthesis, you type the name of the second dataframe, which you want to append to the end of the first.

There are also some optional parameters that you can use, which I’ll discuss in the parameters section.

Series append syntax

The syntax for using append on a Series is very similar to the dataframe syntax.

You type the name of the first Series, and then .append() to call the method.

Then inside the parenthesis, you type the name of the second Series, which you want to append to the end of the first.

And once again, there are also some optional parameters that you can use which will slightly change how the method works.

Let’s take a look at those parameters.

The parameters of append

The Pandas append method has three optional parameters that you can use:

ignore_index
verify_integrity
sort

Let’s look at each of them.

`ignore_index` (optional)

The ignore_index parameter enables you to control the index of the new output Pandas object.

By default, this is set to ignore_index = False. In this case, Pandas keeps the original index values from the two different input dataframes. Keep in mind that this can cause duplicate index values which can cause problems.

If you set this parameter to ignore_index = True, Pandas will ignore the index values in the inputs, and will generate a new index for the output. The index values will be labeled 0, 1, … n - 1.

`verify_integrity` (optional)

The verify_integrity parameter check the “integrity” of the new index. If the index has duplicates, and you set verify_integrity = True, Python will produce an error message.

By default, this parameter is set to set verify_integrity = False. In this case, Python will actually allow duplicates.

`sort` (optional)

The sort parameter controls the sort order of the columns, if the two input dataframes have different columns.

By default, this parameter is set to sort = False. In this case, the columns are not resorted when they are appended together.

If you set sort = True, Pandas will re-sort the columns in the output.

The output of Pandas append

The output of append depends on the input.

Generally, the output will be a new Pandas object, with the rows of the second object appended to the bottom of the first object.

More specifically, if the inputs are dataframes, the output will be a dataframe. And if the inputs are Series, then the output will be a Series.

Also note: the append() method produces a new object and leaves the two original input objects unchanged. This can be very confusing for beginners so remember that the method produces a new object.

Examples: how to append new rows to a Pandas object

Ok. Now that you’ve seen the syntax, let’s look at a few examples of how to use append to add new rows to a Pandas object.

Examples:

Run this code first

Before you run any of the examples, you need to do two things:

import Pandas
create the dataframes we’ll work with

Let’s do those one at a time.

Import Pandas

First, let’s import Pandas.

You can do that with the following code:

import pandas as pd

This will enable us to call pandas functions with the prefix pd, which is the common convention.

Create dataframes

Next, let’s create two dataframes.

Here, we’ll create dataframes that contain mock sales data.

You can create them with the following code:

sales_data_1 = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward"]
,"region":["East",np.nan,"East","South","West"]
,"sales":[50000,52000,90000,np.nan,42000]
,"expenses":[42000,43000,np.nan,44000,38000]})

sales_data_2 = pd.DataFrame({"name":["Thomas","Ethan","Olivia","Arun","Anika","Paulo"]
,"region":["West","South","West","West","East","South"]
,"sales":[72000,49000,np.nan,67000,65000,67000]
,"expenses":[39000,42000,np.nan,39000,44000,45000]})

And let’s print them out, so you can see roughly what’s in them:

print(sales_data_1)
print(sales_data_2)

OUT:

      name region    sales  expenses
0  William   East  50000.0   42000.0
1     Emma    NaN  52000.0   43000.0
2    Sofia   East  90000.0       NaN
3   Markus  South      NaN   44000.0
4   Edward   West  42000.0   38000.0

     name region    sales  expenses
0  Thomas   West  72000.0   39000.0
1   Ethan  South  49000.0   42000.0
2  Olivia   West      NaN       NaN
3    Arun   West  67000.0   39000.0
4   Anika   East  65000.0   44000.0
5   Paulo  South  67000.0   45000.0

As you can see, these dataframes contain sales information, including name, region, total sales, and expenses.

Notice as well that although the dataframes have the same columns, they have different rows. We’ll use the append() method to append the rows in sales_data_2 on to sales_data_1.

EXAMPLE 1: Append new rows onto a dataframe

First, let’s start simple.

Here, we’ll simply append the rows in sales_data_2 to the end (i.e., the bottom) of sales_data_1.

Let’s run the code, and then I’ll explain:

sales_data_1.append(sales_data_2)

OUT:

      name region    sales  expenses
0  William   East  50000.0   42000.0
1     Emma    NaN  52000.0   43000.0
2    Sofia   East  90000.0       NaN
3   Markus  South      NaN   44000.0
4   Edward   West  42000.0   38000.0
0   Thomas   West  72000.0   39000.0
1    Ethan  South  49000.0   42000.0
2   Olivia   West      NaN       NaN
3     Arun   West  67000.0   39000.0
4    Anika   East  65000.0   44000.0
5    Paulo  South  67000.0   45000.0

Explanation

This is fairly simple.

To call the method, we type the name of the first dataframe, sales_data_1, and then we type .append() to call the method.

Inside the parenthesis, we have the name of the second dataframe, sales_data_2.

The output dataframe contains the rows of both, stacked on top of each other.

Notice one thing though: in the numeric index on the left, there are duplicate values. That’s because the index of the original input dataframes both contained similar values (i.e., the index for both started at 0 and incremented by 1 for each row).

These duplicates in the index could be problematic.

We’ll fix it in the next example.

EXAMPLE 2: Ignore and reset the index, when you append new rows

Here, we’ll combine the rows of the two dataframes, but we’ll reset the index for the output dataframe. This will create a new numeric index starting at 0.

To do this, we need to set ignore_index = True. Effectively, this will cause Python to “ignore” the index in the input dataframes, and it will create a new index for the output:

sales_data_1.append(sales_data_2, ignore_index = True)

OUT:

       name region    sales  expenses
0   William   East  50000.0   42000.0
1      Emma    NaN  52000.0   43000.0
2     Sofia   East  90000.0       NaN
3    Markus  South      NaN   44000.0
4    Edward   West  42000.0   38000.0
5    Thomas   West  72000.0   39000.0
6     Ethan  South  49000.0   42000.0
7    Olivia   West      NaN       NaN
8      Arun   West  67000.0   39000.0
9     Anika   East  65000.0   44000.0
10    Paulo  South  67000.0   45000.0

Explanation

Notice in the output that the index starts at 0, increments by 1 for each row, and stops at 10.

This is a new index for the output, and it effectively removes any duplicate index labels that were in the input dataframes.

EXAMPLE 3: Verify the integrity of the index, when you append new rows

Now, instead of resetting the index, let’s verify the index.

To do this, we’ll set verify_integrity = True.

This will check the index labels of the inputs for duplicates. If there are duplicate index labels, Pandas will produce an error.

Let’s take a look:

sales_data_1.append(sales_data_2, verify_integrity = True)

OUT:

ValueError: Indexes have overlapping values: Int64Index([0, 1, 2, 3, 4], dtype='int64')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

....

Explanation

Here, we set verify_integrity = True. This checked the input dataframes for duplicate index labels.

As you can see, running this code produced a ValueError.

The reason is that there were duplicate index labels in the two input dataframes. They both had rows with a labels 0, 1, 2, 3, and 4.

When you encounter an error like this, you may need to do some data cleaning on your input data to remove duplicate rows. Or, you may simply want to ignore the index, as we did in example 2. How you handle this really depends on context.

Frequently asked questions about Pandas append

Now that we’ve looked at some examples, let’s look at some common questions about the append() technique.

Frequently asked questions:

I used append(), but my dataframe is unchanged. Why?

Question 1: I used append, but my dataframe is unchanged. Why?

If you use the append method, you might notice that your original dataframe remains unchanged.

For example, in example 1, we ran the following code:

sales_data_1.append(sales_data_2)

If you print out sales_data_1 after you run that code, you’ll realize that sales_data_1 is unchanged.

That’s because the append() method produces a new dataframe, and leaves both original dataframes unchanged.

By default, this output is sent to the console. We can see it in the console, but to save it, we need to store it with a name.

For example, you could store the output like this:

sales_data_combined = sales_data_1.append(sales_data_2)

You can name the output whatever you want. You could even name it sales_data_1. But be careful, if you do that, it will overwrite your original dataset. Make sure that you check your code so it works properly before you overwrite an input dataframe.

Leave your other questions in the comments below

Do you have any other questions about the Pandas append method?

Is there something else that you need to know that I haven’t covered here?

If so, leave your question in the comments section below.

To learn more about Pandas, sign up for our email list

This tutorial should have given you a good introduction to the Pandas append technique, but if you really want to master data wrangling and data science in Python, there’s a lot more to learn.

So if you’re ready to learn more about Pandas and more about data science, then sign up for our email newsletter.

We publish FREE tutorials almost every week on:

Base Python
NumPy
Pandas
Scikit learn
Machine learning
Deep learning
… and more.

When you sign up for our email list, we’ll deliver these free tutorials directly to your inbox.

Thanks for visiting r-craft.org
This article is originally published at https://www.sharpsightlabs.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

How to Use Pandas Append to Combine Rows of Data in Python

Sign up for FREE data science tutorials

You may also like...

Categories

How to Use Pandas Append to Combine Rows of Data in Python

A quick introduction to Pandas append

The syntax of Pandas append

A quick note

Dataframe append syntax

Series append syntax

The parameters of append

ignore_index (optional)

verify_integrity (optional)

sort (optional)

The output of Pandas append

Examples: how to append new rows to a Pandas object

Run this code first

Import Pandas

Create dataframes

EXAMPLE 1: Append new rows onto a dataframe

Explanation

EXAMPLE 2: Ignore and reset the index, when you append new rows

Explanation

EXAMPLE 3: Verify the integrity of the index, when you append new rows

Explanation

Frequently asked questions about Pandas append

Question 1: I used append, but my dataframe is unchanged. Why?

Leave your other questions in the comments below

To learn more about Pandas, sign up for our email list

Sign up for FREE data science tutorials

Check your email inbox to confirm your subscription ...

You may also like...

Corporate identity graphics in R

How to use geom_line in ggplot2

Animated Flow in the non-tidal Delaware River

Categories

`ignore_index` (optional)

`verify_integrity` (optional)

`sort` (optional)