How to sum specific multiple rows in a Pandas DataFrame?

In today’s Data Wrangling tutorial we’ll show how to use Python to sum all or specific rows of a DataFrame in Pandas. Across the different examples of the tutorial we will use the DataFrame.sum() method along some useful parameters.

Data Preparation

Let’s start by importing the Pandas library into our Python development environment and define some example data

import pandas as pd
# load a csv file into your DataFrame.
inter_df = pd.read_csv('interviews.csv')
print (inter_df .head())

Here is our DataFrame header:


Sum all rows in a Python DataFrame

We’ll start by the simple case in which we just need to summarize all rows in a DF column.

# rows

The parameters axis=0 is aimed at aggregating the rows. Alternatively axis=1 is used to sum the table columns. We’ll receive the following series:

language            KotlinJavaC#VisualBasic
first_interview                       337.0
second_interview                      324.0
dtype: object

If we would like to avoid the concatenation of the non numeric values in the row, we’ll just need to use the numeric_only=True parameter.

inter_df.sum(axis=0,  numeric_only=True)

Resulting in:

first_interview     337.0
second_interview    324.0
dtype: float64

Aggregate multiple specific rows based on conditions

In this case we would like to sum rows in which the data values meet a specific condition.

# pandas sum rows based on conditions
filt = inter_df['first_interview'] > 85
inter_df.loc[filt].sum(axis=0, numeric_only=True)

The result will be:

language            Kotlin
first_interview       89.0
second_interview      83.0
dtype: object

Sum Pandas rows with same value (by column values) with Pandas

In this case we’ll use brackets notation:

filt = inter_df['first_interview'] == 89
inter_df[filt].sum(axis=0, numeric_only=True)

Sum rows based on index value

In this example we’ll use the iloc indexer to filter out the first rows and then summarize them.

# first two rows
inter_df.iloc[0:2].sum(axis=0, numeric_only=True)

And the resulting series:

first_interview     173.0
second_interview    166.0
dtype: float64