# Pandas Count, Explained

This article is originally published at https://www.sharpsightlabs.com

In this tutorial, I’ll show you how to use the Pandas count technique to count the records in a Pandas dataframe.

I’ll explain exactly what the technique does, how the syntax works, and I’ll show you step-by-step examples so you can see Pandas count in action.

If you need something specific, just click on any of the following links.

**Table of Contents:**

Ok. Let’s start with a quick introduction.

## A quick introduction to Pandas Count

The Pandas count function is pretty simple. The `count()`

technique counts the number of non-missing records in a Pandas object.

This method works on:

- Pandas dataframes
- Pandas Series objects
- individual dataframe columns

Frequently, we use the `count()`

technique for data exploration.

But it’s also very useful for data cleaning and data analysis. For example, there are situations where missing values are bad, so we sometimes need to identify variables that contain non-missing values. The Pandas count technique is one way to identify columns that contain a large number of missing values.

Having said all of that, how exactly the method works depends on the syntax.

With that in mind, let’s look at the syntax of the Pandas count method.

## The syntax of Pandas count

The syntax of the count method is fairly simple, but there are a few ways to use it and a few parameters that can modify its functionality.

So in this section, I’ll cover how to use the count method on dataframes and dataframe columns. I’ll also explain the most useful parameters.

#### A quick note

Everything that I’m about to explain assumes that you’ve imported Pandas and that you already have a dataframe that you’re working with.

You can import pandas with the following code:

import pandas as pd

And if you need a refresher on dataframes, you can read our introduction to Pandas dataframes.

### Dataframe Syntax

Let’s start with how to use the count technique on dataframes.

To call the count method with a dataframe, you simply type the name of the dataframe, and then `.count()`

.

So if your dataframe is named `your_dataframe`

, you can use the code `your_dataframe.count()`

to count the number of non-missing values in each of the columns.

There are also some additional parameters that you can use inside the parenthesis, which we’ll get to in a moment.

### Series Syntax

Next, let’s look at the syntax for how to use Pandas count on a Series.

The syntax for a Series is very similar to the syntax for a dataframe.

Simply type the name of your series, then `.count()`

to call the method.

Again, there are some additional parameters that you can call that will modify the technique.

### Dataframe Column Syntax

Finally, we have the syntax for a dataframe column.

Remember that a dataframe column is actually a Pandas series. Additionally, we can retrieve a column from a dataframe using so-called “dot syntax.”

So the syntax for using `count()`

on a column is a two step process:

- retrieve the column using dot syntax
- call the count method

So if you have a dataframe named `your_dataframe`

, and a column named `column`

, you’ll use the code `your_dataframe.column.count()`

to use the count technique on that one column.

I’ll show you an example of this in example 4.

Again though, there are some optional parameters that control how the technique works. Let’s look at those parameters.

### The parameters of Pandas count

There are two parameters you should know for the count method:

`axis`

`numeric_only`

Let’s talk about each of these.

`axis`

(optional)

The `axis`

parameter controls whether `count()`

operates on the rows or the columns.

By default, this parameter is set to `axis = 0`

, so `count()`

counts the number of non-missing values in the axis-0 direction. This is effectively the column count.

You can change this and set the parameter to `axis = 1`

. This will compute the non-missing values in the axis-1 direction (the row counts).

You can also use an alternative notation, with `axis = "columns"`

or `axis = "rows"`

. I strongly discourage you from using this notation, because it’s highly confusing. I explain why in the FAQ section.

One final comment on the `axis`

parameter: to understand this parameter, you really need to understand axes. For an explanation of how axes work, you should read our tutorial on Numpy axes (Numpy axes are very similar to dataframe axes).

`numeric_only`

(optional)

The `numeric_only`

parameter enables you to force the count method to only return counts for numeric variables.

By default, this is set to `numeric_only = False`

, so the count method returns the counts for *all* of the variables.

But if you set `numeric_only = True`

, the count method will return the counts for the numeric variables only (integers, floats, etc).

I show an example of this in example 3.

## Examples: how to count records in a Pandas dataframe or Pandas series

Now that we’ve looked at the syntax, let’s look at some examples of how to use the Pandas count technique.

We’ll look at examples of how to count the records in a dataframe, how to count the records in a single column, and a few other uses.

**Examples:**

- Count the records in all columns of a dataframe
- Count the number of non-missing values in the rows
- Get counts for numeric variables only
- Count the non-missing values in a specific column
- Count records in a subset of columns

#### Run this code first

Before you run these examples, you’ll need to run some preliminary code.

In particular, you need to:

- Import necessary packages
- Load a dataframe

Let’s do those one at a time.

##### Import Packages

First, let’s just import a couple packages.

We need to import Pandas (because the `count()`

method is part of Pandas).

import pandas as pd import seaborn as sns

We also need to import Seaborn, because we’ll be working with the `titanic`

dataframe, which is included in the Seaborn package.

Let’s load that dataset next.

##### Load dataframe

Here, we’ll load the `titanic`

dataframe.

titanic = sns.load_dataset('titanic')

Let’s also print it out, so we can see the contents:

print(titanic)

OUT:

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone 0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True 887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True 888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False 889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True 890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True [891 rows x 15 columns]

This dataframe has 15 columns. If you look carefully at the output above, you’ll actually see some `NaN`

values. We’ll be able to count the non-`NaN`

values with `count()`

.

### EXAMPLE 1: Count the records in all columns of a dataframe

First, we’ll count the number of non-null records for every column in our dataframe.

This is the simplest way to use the count method on a dataframe.

Let’s take a look, and then I’ll explain:

titanic.count()

OUT:

survived 891 pclass 891 sex 891 age 714 sibsp 891 parch 891 fare 891 embarked 889 class 891 who 891 adult_male 891 deck 203 embark_town 889 alive 891 alone 891 dtype: int64

##### Explanation

This is really simple.

Here, we’re using `count()`

on the entire `titanic`

dataframe.

To do this, we simply typed the name of the dataframe, `titanic`

, and then `.count()`

.

In the output, we can see the number of non-missing records for every column.

We know that the dataframe has 891 total rows (we saw this when we printed out the data).

And here, we can see that many of the variables – like `survived`

, `pclass`

, and `class`

– have 891 values. These variables are fully populated.

But we also see that some values have less than 891. For example, `deck`

has only 203 non-missing records. `age`

has 714.

This could be useful information during data cleaning. It could also be useful if you’re building a machine learning model, since some model types will not tolerate missing values.

### EXAMPLE 2: Count the number of non-missing values in the rows

Next, let’s count the number of non-missing values in each of the rows.

Typically, I use the `count()`

technique to count the non-missing values for the `columns`

. But there might be times when you need to examine the rows instead.

To do this, we need to use the axis parameter.

Let’s take a look:

titanic.count(axis = 1)

OUT:

0 14 1 15 2 14 3 15 4 14 .. 886 14 887 15 888 13 889 15 890 14 Length: 891, dtype: int64

##### Explanation

Here, we can see the number of non-missing values for the rows.

Now remember: we know from our earlier data examination that the dataframe has 15 columns. So a fully populated row should have 15 non-missing values.

But we can see that several of the rows displayed have 13 or 14 non-missing values. In fact, the first row has only 14 values. That means that some of these rows have missing values. That might be okay, but maybe not, depending on what you’re doing.

In terms of syntax, notice that we needed to set `axis = 1`

to count the number of missing values in the rows. Understanding Pandas “axes” is difficult, but it would definitely help if you reviewed Numpy axes. Pandas axes are essentially the same as axes for a 2D Numpy array.

Note that you can also do the same thing if you set `axis = 'columns'`

. `axis = 'columns'`

is the same as `axis = 1`

. Having said that, I strongly discourage this notation, because it’s extremely confusing. Setting `axis = 'columns'`

actually gives you the number of non-missing values for the rows. There

s a reason why the Pandas developers named it this way, but it only makes sense if you really understand what axes are. Again, I strongly suggest you avoid this alternate notation, and simply use `axis = 1`

. I explain this more in the FAQ section.

### EXAMPLE 3: Get counts for numeric variables only

Next, let’s get the counts for only the numeric variables.

To do this, we can set `numeric_only = True`

.

titanic.count(numeric_only = True)

OUT:

survived 891 pclass 891 age 714 sibsp 891 parch 891 fare 891 adult_male 891 alone 891 dtype: int64

##### Explanation

Here, by setting `numeric_only = True`

, the `count()`

technique is computing the number of non-missing values for the numeric columns only.

### EXAMPLE 4: Count the non-missing values in a specific column

Here, let’s count the non-missing values of a specific column.

In particular, we’ll count the number of non-missing values in the `deck`

variable.

titanic.deck.count()

OUT:

203

##### Explanation

To get the number of non-missing values in a single column, we need to use a two step process:

- use “dot syntax” to retrieve a specific column
- call the
`.count()`

method

So the syntax `titanic.deck`

retrieves the `deck`

variable from the `titanic`

dataframe.

Then, by using `.count()`

, Python will count the non-missing records for only that column.

### EXAMPLE 5: Count records in a subset of columns

Finally, let’ me show you a “special” technique.

Here, we’ll count the non-missing records in a subset of a few columns.

To do this, we need to use some syntax that you’re unlikely to see elsewhere.

Let’s take a look, and then I’ll explain.

(titanic .filter(['survived','age','embark_town']) .count() )

OUT:

survived 891 age 714 embark_town 889 dtype: int64

##### Explanation

Here, we counted the non-missing records in three variables: `survived`

, `age`

, and `embark_town`

.

To do this, we actually needed to call two different Pandas methods.

First, we called the filter method, to retrieve a subset of dataframe columns.

After that, we called the `count()`

method.

And this computes the non-missing records for only the three chosen variables in our subset.

Also, notice that all of the functions are on separate lines. To make this work, we need to enclose the whole expression inside of parenthesis.

This style of Pandas coding is atypical, but it can be *very* useful when you’re doing data cleaning, data exploration, or data analysis.

It’s actually a very powerful Pandas technique that you should learn more about (leave your questions about it in the comments section at the bottom of the page).

## Frequently asked questions about KEYWORD

Now that you’ve seen some examples of the Pandas count technique, let’s look at a common question.

**Frequently asked questions:**

### Question 1: Why does `axis = 'columns'`

count the non-missing values in the rows?

If you set `axis = 'columns'`

, you’ll notice that it actually counts the non-missing values for the *rows*.

WTF?

Ok. To understand this, you need to understand axes, and how most people *think* about axes.

Let’s quickly review.

Pandas dataframes (like Numpy arrays) have axes. Axes are like *directions* along the dataframe.

An analogy here is the 3-dimensional Cartesian coordinate system. There’s an x-axis, y-axis, and z-axis. Those are *directions* in 3D space.

Dataframes also have axes. Axes are like directions along the dataframe.

For a dataframe, axis-0 points downward, and axis-1 points horizontally.

I won’t explain the reasons here about why the axes are numbered like this. It’s just something you need to memorize: axis-0 points downward, and axis-1 points horizontally.

Having said that, people commonly think of axis-1 as the “columns” axis. Why? Because when we visualize it like in the image above, we typically show an arrow pointing horizontally across the top of the columns. So people think of axis-1 as the “columns” axis.

But that’s really foolish, because when you use axis-1 for an operation, such as `.count(axis = 1)`

, it actually computes the *row* counts.

I honestly think this is a misunderstanding of how people think about axes, and using terminology in a counter-intuitive way.

All of that being the case, I strongly suggest that you avoid the notation `count(axis = "columns")`

or `count(axis = "rows")`

.

Instead, use `axis = 1`

or `axis = 0`

.

For more information about axes, read our tutorial on Numpy axes. The details about 2D Numpy arrays apply to Pandas dataframes.

##### Leave your other questions in the comments below

Do you have other questions about the Pandas count technique?

Is there something you’re still confused about, that I haven’t covered here?

If so, leave your question in the comments section below.

## To learn more about Pandas, sign up for our email list

This tutorial should have helped you understand the Pandas count technique, and how it works.

But if you want to master data wrangling and data exploration with Pandas, there’s a lot more to learn.

There’s even more to learn if you want to learn data science in Python.

That said, if you’re ready to learn more about Pandas and data science in Python, then sign up for our email list.

When you sign up, you’ll get free tutorials on:

- NumPy
- Pandas
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.

We publish free data science tutorials every week. When you sign up for our email list, we’ll deliver these free tutorials directly to your inbox.

Thanks for visiting r-craft.org

This article is originally published at https://www.sharpsightlabs.com

Please visit source website for post related comments.