How to sum multiple columns with nan values in Pandas?

In today’s quick tutorial we’ll learn how to sum columns in Pandas DataFrames which contains missing or non available data, which might be marked as NAN by Pandas.

Create a DataFrame

We’ll start by creating a simple DataFrame for you to follow along with this example.

# import pandas and numpy libraries
import pandas as pd, numpy as np

rev_dict = {'year':['2012', '2014', '2013', '2022'], 'h1':[np.nan, 250, 175, 190], 'h2': [235, 245, 240, np.nan]}

revenue = pd.DataFrame (data = rev_dict)

print (revenue)
yearh1h2
02012NaN235.0
12014250.0245.0
22013175.0240.0
32022190.0NaN

Sum columns with nan cell values in Pandas

We can now go ahead and use the fillna() DataFrame method in order to handle cells with missing values and then sum the columns.

We will replace the NAN missing values with zeros and sum the columns. This way we can actually skip / ignore the missing values.

revenue['total'] = revenue['h1'].fillna(0) + revenue['h2'].fillna(0)

print (revenue)

Here’s our DataFrame:

yearh1h2total
02012NaN235.0235.0
12014250.0245.0495.0
22013175.0240.0415.0
32022190.0NaN190.0

We can alternatively fill all the missing values in the entire DataFrame and then simply sum the columns:

revenue.fillna(0, axis=1, inplace=True)

revenue['total'] = revenue['h1'] + revenue['h2']

Sum a list of Pandas columns

A follow up question i got is how to use a Python list to define columns in a DataFrame that should be aggregated:

revenue['total'] = revenue[['h1', 'h2']].sum(axis=1)

Suggested learning