In today’s Data Wrangling tutorial we’ll show how to use Python to sum all or specific rows of a DataFrame in Pandas. Across the different examples of the tutorial we will use the DataFrame.sum() method along some useful parameters.
Data Preparation
Let’s start by importing the Pandas library into our Python development environment and define some example data
import pandas as pd
# load a csv file into your DataFrame.
inter_df = pd.read_csv('interviews.csv')
print (inter_df .head())
Here is our DataFrame header:
language | first_interview | second_interview | |
---|---|---|---|
1 | Kotlin | 89.0 | 83.0 |
2 | Java | 84.0 | 83.0 |
3 | C# | 82.0 | 79.0 |
4 | VisualBasic | 82.0 | 79. |
Sum all rows in a Python DataFrame
We’ll start by the simple case in which we just need to summarize all rows in a DF column.
# rows
inter_df.sum(axis=0)
The parameters axis=0 is aimed at aggregating the rows. Alternatively axis=1 is used to sum the table columns. We’ll receive the following series:
language KotlinJavaC#VisualBasic first_interview 337.0 second_interview 324.0 dtype: object
If we would like to avoid the concatenation of the non numeric values in the row, we’ll just need to use the numeric_only=True parameter.
inter_df.sum(axis=0, numeric_only=True)
Resulting in:
first_interview 337.0
second_interview 324.0
dtype: float64
Aggregate multiple specific rows based on conditions
In this case we would like to sum rows in which the data values meet a specific condition.
# pandas sum rows based on conditions
filt = inter_df['first_interview'] > 85
inter_df.loc[filt].sum(axis=0, numeric_only=True)
The result will be:
language Kotlin first_interview 89.0 second_interview 83.0 dtype: object
Sum Pandas rows with same value (by column values) with Pandas
In this case we’ll use brackets notation:
filt = inter_df['first_interview'] == 89
inter_df[filt].sum(axis=0, numeric_only=True)
Sum rows based on index value
In this example we’ll use the iloc indexer to filter out the first rows and then summarize them.
# first two rows
inter_df.iloc[0:2].sum(axis=0, numeric_only=True)
And the resulting series:
first_interview 173.0 second_interview 166.0 dtype: float64
Aggregate pandas DataFrame columns by row
You might want to sum across all numeric columns of your DataFrame. If so, you can use the following code:
inter_df['total'] = inter_df.sum(axis=1, numeric_only= True)