Pandas isna, Explained
This article is originally published at https://www.sharpsightlabs.com
In this blog post, I’ll explain how to use the Pandas isna technique.
I’ll describe what the technique does, explain the syntax, and I’ll show you clear examples of how to use it.
If you need to learn something specific, just click on one of these links:
Table of Contents:
Let’s get started with a quick introduction to the isna()
technique.
A quick introduction to Pandas isna
The Pandas isna method detects missing values in Python dataframe or Pandas Series.
As suggested above, we can us Pandas isna on several different data structures, including:
- Pandas Series
- Pandas dataframes
- individual columns in a dataframe
So in that sense, the method is flexible in terms of how we use it.
Pandas isna is important for Python data wrangling
The isna method is important for data wrangling in Python.
Dealing with missing values is a very common problem when we wrangle data, but also when we analyze data or create machine learning models.
In fact, finding and dealing with missing values is one of the first things you will do when you wrangle or analyze a dataset.
That being the case, you need a way to identify missing values when you’re working with your Python data.
Enter, Pandas isna.
The syntax of isna
Let’s look at the syntax of the isna()
technique.
Here, we’ll look at the syntax separately for the following Python data structures:
- dataframes
- Series
- dataframe columns
The reason, is that the syntax for Pandas isna will be slightly different for each object type.
A quick note
Before looking at the syntax, I want to remind you of a couple things.
First, the syntax explanations below assume that you’ve already installed Pandas and imported it into your environment.
Assuming that you have it installed already on your computer, you can import Pandas with this code:
import pandas as pd
Second, the syntax explanations below assume that you have either a Pandas dataframe or a Pandas Series object available.
To learn more about Pandas dataframes, read our Pandas dataframe tutorial.
With all that said, let’s look at the syntax.
Series syntax
First, we’ll look at the syntax for how to use isna()
on a lone Pandas Series.
When you use isna on a Series, you first just type the name of the Series object (i.e., the name that you’ve assigned to it).
Then, you just type .isna()
to call the method, just like you would call any other method on Python.
That’s all there is to it.
When you do this, the method will produce a new Series of boolean True/False values, that will show which values were missing in the original Series.
dataframe syntax
Next, let’s look at how to use isna on a dataframe.
The syntax for dataframes this is very similar to the syntax above for Pandas Series.
First, you just type the name of the dataframe you want to operate on.
Then you type .isna()
to call the method.
So if your dataframe is named your_dataframe
, you’ll type the code your_dataframe.isna()
.
The output of this operation will be an object that’s the same size of your input dataframe. This output will contain True/False values that indicate which dataframe values were missing in the original.
column syntax
Finally, we’ll look at the syntax for how to use Pandas isna on a single column of a dataframe.
It’s important to remember here that individual columns inside of a dataframe are actually Pandas series objects. So if we retrieve a column using “dot syntax,” then we can use the syntax above for Pandas Series.
Let’s take a look at how this works.
First, you can type the name of the dataframe.
Then, you use “dot syntax” to specify the individual column inside the dataframe that you want to operate on.
So applying Pandas .isna()
to a dataframe column involves two steps:
- get the column from the dataframe with “dot syntax”
- use the
.isna()
method
So for example, if you have a dataframe called your_dataframe
that contains a column called column
, then you’ll use the syntax your_dataframe.column.isna()
to find missing values in that particular column.
Output (additional notes)
Very quickly, let’s talk about the structure and contents of the ouput.
As I mentioned above, the output of .isna()
is a new Pandas object that’s the same size as the input object.
This new object will contain True/False values that show which values are missing (True
means missing).
The value types that .isna()
will consider as “missing” are:
None
numpy.NaN
So empty strings (i.e., ''
) or numpy.inf
, will not count as missing values; .isna()
will return False
for these values.
Examples: how to detect missing values in Python
Now that we’ve finished looking at the syntax, let’s look at some examples of Pandas isna().
Examples:
- Identify missing values in a dataframe column
- Identify missing values in an entire dataframe
- Count the missing values in each column of the dataframe
Run this code first
Before we run these examples, there’s a little preliminary setup that you’ll need to run.
Specifically, you’ll need to:
- import Pandas and Numpy
- create a dataframe
Let’s do each of those.
Load Pandas
First, we need to import Pandas and Numpy:
import pandas as pd import numpy as np
We’ll use Pandas to create a dataframe, and we’ll use Numpy to create missing values inside that dataframe using np.nan
.
Create a dataframe
Next, we need to create a dataframe that we can work with.
Here, we’re going to create a dataframe that contains mock sales data:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"] ,"region":["East",np.nan,"East","South","West","West","South","West","West","East","South"] ,"sales":[50000,52000,90000,np.nan,42000,72000,49000,np.nan,67000,65000,67000] ,"expenses":[42000,43000,np.nan,44000,38000,39000,42000,np.nan,39000,44000,45000]})
Let’s print the dataframe to see its contents:
print(sales_data)
OUT:
name region sales expenses 0 William East 50000.0 42000.0 1 Emma NaN 52000.0 43000.0 2 Sofia East 90000.0 NaN 3 Markus South NaN 44000.0 4 Edward West 42000.0 38000.0 5 Thomas West 72000.0 39000.0 6 Ethan South 49000.0 42000.0 7 Olivia West NaN NaN 8 Arun West 67000.0 39000.0 9 Anika East 65000.0 44000.0 10 Paulo South 67000.0 45000.0
This dataframe, sales_data
, has four variables. Two of the variables contain character data, and two of the variables contain numeric data.
Critically, you’ll notice that some of the values are missing (i.e., NaN
).
We’ll use .isna()
to detect those missing values.
EXAMPLE 1: Identify missing values in a dataframe column
First, we’ll identify the missing values in one specific column.
We’re going to identify the missing values in the sales
column of the dataframe.
sales_data.sales.isna()
OUT:
0 False 1 False 2 False 3 True 4 False 5 False 6 False 7 True 8 False 9 False 10 False
Explanation
Here, we’ve identified the missing values in the sales
column of the sales_data
dataframe.
This involved 2 steps:
- we retrieved the
sales
column using “dot syntax” - then, we called
.isna()
to identify the missing values in that column
So sales_data.sales
retrieved the sales
column from the dataframe.
And, the syntax .isna()
identified the missing values.
Notice that the output of this code is a new object that has the same shape as the sales
column. Also notice that where the value was missing in the sales
column, the output shows True
. Otherwise, the output shows False
.
EXAMPLE 2: Identify missing values in an entire dataframe
Next, we’re going to find the missing values in an entire dataframe.
In order to do this, we’ll type the name of the dataframe, and then call .isna()
.
sales_data.isna()
OUT:
name region sales expenses 0 False False False False 1 False True False False 2 False False False True 3 False False True False 4 False False False False 5 False False False False 6 False False False False 7 False False True True 8 False False False False 9 False False False False 10 False False False False
Explanation
This should be easy to understand.
Here, we’ve called the .isna()
method on the entire sales
dataframe.
To do this, we simply typed the name of the dataframe, and then typed .isna()
to call the method.
In the output, you’ll notice boolean True/False values for every value of the input. The output shows True
where the value was missing in the sales
dataframe, and the output shows False
otherwise.
EXAMPLE 3: Count the missing values in each column of the dataframe
Finally, let’s count the missing values in each column of our dataframe.
To accomplish this, we’re going to use two Pandas methods:
- Pandas isna
- Pandas sum
We’ll use isna to identify the missing values, and we’ll use Pandas sum to count them.
(sales_data .isna() .sum() )
OUT:
name 0 region 1 sales 2 expenses 2 dtype: int64
Explanation
Look carefully at the output. The output shows the count of the missing values for each column of the input dataframe.
To accomplish this, we needed to call two Pandas methods, one after the other.
First, we called the .isna()
method, which identified the missing values.
Then, we called .sum()
to count them.
Additionally, notice that we used a special syntax trick. We enclosed the whole chain of methods inside of parenthesis. And, we put the different methods on different lines.
I sometimes refer to this as Pandas method chaining, although keep in mind that you can use this for almost any type of Python method.
This is a somewhat unconventional technique, but is extremely powerful when you’re doing data wrangling or data analysis. If you know how to use this technique properly, you can chain together multiple methods (many more than 2) to perform complex data manipulations. It also makes it easier to read and debug your code.
This is one of the secrets to mastering Pandas, and you really should learn it.
Frequently asked questions about Pandas isna
Now that you’ve learned about Pandas isna and seen some examples, let’s review some frequently asked questions about the method.
Frequently asked questions:
Question 1: What’s the difference between Pandas isna and isnull?
Essentially, there is no difference.
Pandas isna and Pandas isnull do the same thing, and operate the same way.
Pandas .isnull()
is really just an alias of Pandas .isna()
.
I suggest that you just pick one of the two versions, and use it consistently in your code.
Leave your other questions in the comments below
Do you have other questions about the Pandas isna technique?
Is there something that I didn’t cover here that you need help with?
If so, leave your question in the comments section below.
To learn more about Pandas, sign up for our email list
This tutorial should have given you a good introduction to the Pandas isna technique, but if you really want to master Pandas data wrangling, then you’ll need to learn a lot more.
So if you want to learn more about Python data wrangling and learn more about Python data science generally, then sign up for our email newsletter.
We publish FREE tutorials almost every week on:
- Base Python
- NumPy
- Pandas
- Scikit learn
- Machine learning
- Deep learning
- … and more.
When you sign up for our email list, we’ll deliver these free tutorials directly to your inbox.
Thanks for visiting r-craft.org
This article is originally published at https://www.sharpsightlabs.com
Please visit source website for post related comments.