Today we’ll learn how to count rows meeting specific values in a Pandas Dataframe. We’ll look into several cases as listed below:
- Count row occurrences based on a specific value.
- Count rows meeting multiple conditions.
- Finding the number of occurrences of a specific value in a row.
- Counting unique values in a column
Create example data
Let us get started with creating a DataFrame that you can use in order to follow along with this tutorial:
import pandas as pd
prog_lang = ['Python', 'Java', 'Javascript', 'Java', 'R']
office = ['Toronto', 'Rome', 'Madrid', 'Rome', 'Madrid']
candidates = [163.0, 145.0, 163.0, 193.0, 146.0]
hr = dict(office = office, prog_lang=prog_lang, candidates = candidates)
hr_df = pd.DataFrame(data=hr)
print( hr_df.head())
Here’s our data:
office | prog_lang | candidates | |
---|---|---|---|
0 | Toronto | Python | 163.0 |
1 | Rome | Java | 145.0 |
2 | Madrid | Javascript | 163.0 |
3 | Rome | Java | 193.0 |
4 | Madrid | R | 146.0 |
Count filtered rows based on value
Let’s find which rows meet a certain condition. We’ll start by define a simple filter.
filt = hr_df['prog_lang'].str.contains('Java')
This will return a series of boolean values indicating whether our DataFrame rows meet the criteria we just defined. Now let’s filter our DataFrame rows:
hr_df[filt]
Here’s the result:
office | prog_lang | candidates | |
---|---|---|---|
1 | Rome | Java | 145.0 |
2 | Madrid | Javascript | 163.0 |
3 | Rome | Java | 193.0 |
Now we can easily count the relevant rows using the len function or shape method.
len(hr_df[filt])
or
hr_df[filt].shape[0]
Both will return the result 3.
Note: You might want to persist the filtered rows as a DataFrame that you can use in your Data Analysis:
hr_subset = hr_df[filt]
Count rows meeting multiple conditions
In the same fashion we can filter our DataFrame according to multiple complex conditions, concatenated by AND (&) / OR (|) operands.
# multiple conditions
filt = (hr_df['prog_lang'].str.contains('Java')) & (hr_df['candidates'] > 190)
'filter the DataFrame
hr_df[filt]
'Count the rows
hr_df[filt].shape[0]
The result will be 1.
Counting unique values in a Pandas Series
If you want to count unique values in a column, you just use value_counts()
hr_df['candidates'].value_counts()
Count occurrences of a value in a column
What if we want to count occurrences of a specific value in a DataFrame column / Series. In the example below we would like to select all rows in the candidates column which are equal to 145.
hr_df['candidates'].value_counts()[145]