Pandas isnull
This article is originally published at https://www.sharpsightlabs.com
In this tutorial, I’ll explain how to use the Pandas isnull technique to detect missing values.
I’ll explain exactly what the technique does, how the syntax works, and I’ll show you step-by-step examples of how to use isnull.
If you need something specific, just click on any of the following links.
Table of Contents:
First, let’s start with an introduction to isnull, and what it does.
A quick introduction to Pandas isnull
The Pandas isnull technique detects missing values in Python.
We can use the isnull technique on several types of Pandas objects, including:
- Pandas Series
- whole Pandas dataframes
- individual columns in a dataframe
So it’s somewhat flexible in terms of what types of objects we can use it on.
Pandas isnull is very useful for data wrangling
The isnull technique is very useful for data wrangling, data cleaning, and data analysis.
Missing values are often somewhat troublesome when we analyze data and create machine learning models.
That being the case, we often need to identify missing values when we clean up our data, analyze it, or before we build a machine learning model.
So this is a simple technique, but often a necessary technique when you’re doing data science in Python.
The syntax of isnull
Now that you’ve learned a little bit about what the Pandas isnull technique does, let’s take a look at the syntax.
As I mentioned earlier, we can use the isnull()
technique on:
- dataframes
- Series
- dataframe columns
The syntax for each of these use cases will be slightly different, so we’ll review the syntax for each of those separately.
A quick note
Before we look at the syntax, I need to mention a couple of things.
First, all of the syntax explanations I’m about to show you assume that you’ve already imported Pandas.
If you haven’t done so yet, you can import Pandas with the following code:
import pandas as pd
Second, these syntax explanations will assume that you already have a Pandas series or a Pandas dataframe available.
If you need a refresher on dataframes, you can read our quick introduction to Pandas dataframes.
Series syntax
Let’s start by looking at how we can use isnull()
on an individual Pandas Series.
First, you simply type the name of your Series, followed by .isnull()
to call the method.
That’s really it.
The output will be a Series of True/False boolean values that indicate which values are missing, and which are not missing.
dataframe syntax
The syntax for a dataframe is really very similar to the syntax for a Series.
You simply type the name of the dataframe, and then .isnull()
to call the method.
So if your dataframe is named your_dataframe
, you’ll type the code your_dataframe.isnull()
.
The output will be an object of the same size as your dataframe that contains boolean True/False values. These boolean values indicate which dataframe values were missing.
column syntax
Finally, let’s look at the syntax for using isnull on a dataframe column.
Remember that a dataframe column is actually a Pandas series object. And to retrieve a column from a dataframe, we can use “dot syntax”. Let’s take a look, and I’ll explain further.
So detecting missing values in a column is a two-step process:
- retrieve the column from the dataframe using “dot syntax”
- call the
.isnull()
method
So if you have a dataframe named your_dataframe
, and there’s a column named column
, you’ll use the code your_dataframe.column.isnull()
to detect missing values in that column of the dataframe.
Output (additional notes)
Let’s quickly discuss the output.
As I mentioned earlier, the output is a new object of the same size as the input object.
The output object will contain boolean True/False values that indicate which values are missing.
Values that count as “missing” are:
None
numpy.NaN
Values like an empty string (i.e., ''
) or numpy.inf
will not count as missing values when you use the isnull()
method.
Examples: how to detect missing values in Python
Now that we’ve looked at the syntax, let’s look at some examples of how to use the Pandas isnull()
technique.
Examples:
- Find missing values in a Pandas dataframe column
- Identify the missing values in an entire dataframe
- Count the missing values in every column of a dataframe
Run this code first
Before we actually run the examples, you’ll need to run some preliminary code in order to:
- import Pandas
- create a dataframe
Let’s do those one at a time.
Load Pandas
First, you need to import Pandas.
You can do that with the following code:
import pandas as pd
Create a dataframe
Next, we’ll create a dataframe with some mock sales data:
sales_data = pd.DataFrame({"name":["William","Emma","Sofia","Markus","Edward","Thomas","Ethan","Olivia","Arun","Anika","Paulo"] ,"region":["East",np.nan,"East","South","West","West","South","West","West","East","South"] ,"sales":[50000,52000,90000,np.nan,42000,72000,49000,np.nan,67000,65000,67000] ,"expenses":[42000,43000,np.nan,44000,38000,39000,42000,np.nan,39000,44000,45000]})
And let’s print it out to see the contents:
print(sales_data)
OUT:
name region sales expenses 0 William East 50000.0 42000.0 1 Emma NaN 52000.0 43000.0 2 Sofia East 90000.0 NaN 3 Markus South NaN 44000.0 4 Edward West 42000.0 38000.0 5 Thomas West 72000.0 39000.0 6 Ethan South 49000.0 42000.0 7 Olivia West NaN NaN 8 Arun West 67000.0 39000.0 9 Anika East 65000.0 44000.0 10 Paulo South 67000.0 45000.0
As you can see, this dataframe has four variables, with a mixture of character data and numeric data.
Importantly, you can see that several rows have missing values (i.e., NaN
). We’ll be able to use isnull()
to identify those in a programatic way.
EXAMPLE 1: Find missing values in a Pandas dataframe column
First, let’s identify the missing values in a single column.
Here, we’ll identify the missing values in the sales
column of the sales_data
dataframe:
sales_data.sales.isnull()
OUT:
0 False 1 False 2 False 3 True 4 False 5 False 6 False 7 True 8 False 9 False 10 False
Explanation
Identifying the missing values in the sales
variable is a two step process:
- first we need to retrieve the column using “dot syntax”
- then, we need to call
.isnull()
To the code sales_data.sales
retrieves the sales
variable from the dataframe.
Then, the code .isnull()
identifies the missing values.
Notice that the output is an object with the same shape as the sales
variable. The value of the output is True
if the input value is missing, and False
otherwise.
EXAMPLE 2: Identify the missing values in an entire dataframe
Next, we’ll identify the missing values in a whole dataframe.
To do this, we simply type the name of the dataframe, and then type .insnull()
to call the method:
sales_data.isnull()
OUT:
name region sales expenses 0 False False False False 1 False True False False 2 False False False True 3 False False True False 4 False False False False 5 False False False False 6 False False False False 7 False False True True 8 False False False False 9 False False False False 10 False False False False
Explanation
I think this is easy to understand.
To use the Pandas isnull method on a whole dataframe, just type the name of the dataframe, and then .isnull()
.
In the output, you can see True/False values for every value of every column. The output value is True
when the input value was missing, and False
otherwise.
EXAMPLE 3: Count the missing values in every column of a dataframe
Finally, let’s do a slightly more difficult, but more useful example.
Here, we’ll count the number of missing values in every column of a dataframe.
To do this, we actually need to use multiple tools.
We need to use isnull()
to identify the missing values, and then we need to use the Pandas sum method to count them up.
Let’s take a look:
(sales_data .isnull() .sum() )
OUT:
name 0 region 1 sales 2 expenses 2 dtype: int64
Explanation
In the output, you can see a count of the number of missing values, by column.
When you’re doing data cleaning or data analysis, a technique like this can be extremely useful.
Notice that to do it, we needed to call two Pandas methods in series. We typed the name of the dataframe, then .isnull()
to identify the missing values, and .sum()
to count the missing values.
Furthermore, notice that we used a special syntax to do this. We enclosed the whole expression inside of parenthesis, and put the different Pandas methods on different lines. This style of Pandas coding is unorthodox, but extremely powerful, once you know how to use it properly. It enables you to combine multiple Pandas methods in series to perform complex data manipulations. Additionally, it makes reading and debugging your code much easier.
If you want to learn more about this style of Pandas data wrangling, sign up for our email newsletter.
Leave your other questions in the comments below
Do you have other questions about the Pandas isnull technique?
Is there something that I didn’t cover here that you need help with?
If so, leave your question in the comments section below.
To learn more about Pandas, sign up for our email list
This tutorial should have given you a good introduction to the Pandas isnull technique, but if you really want to master data wrangling and data science in Python, there’s a lot more to learn.
So if you’re ready to learn more about Pandas and more about data science, then sign up for our email newsletter.
We publish FREE tutorials almost every week on:
- Base Python
- NumPy
- Pandas
- Scikit learn
- Machine learning
- Deep learning
- … and more.
When you sign up for our email list, we’ll deliver these free tutorials directly to your inbox.
Thanks for visiting r-craft.org
This article is originally published at https://www.sharpsightlabs.com
Please visit source website for post related comments.