While analyzing data you typically group data using the very handy Pandas groupby method. You can then more easily subset your data , iterate over the groups while applying some transformation logic or just probably visualize your data using pandas, matplotlib or seaborn. However, once your data is grouped as a Series object, you might want to convert it to a DataFrame instead. Once you have a DataFrame, you can take advantage of significant amount of methods that will simplify your data wrangling process.
In this tutorial i would like to show you three simple ways that helps turn your aggregated data into a pandas DF.
Creating example Data
We’ll start by creating data and grouping it using the groupby DataFrame method
import pandas as pd # define the dataset language = ['VisualBasic', 'VisualBasic', 'Java', 'Java', 'VisualBasic', 'Java', 'C#', 'VisualBasic'] office = ['Atlanta', 'New York', 'New York', 'Boston', 'Atlanta', 'New York', 'Atlanta', 'New York'] month = ['March', 'May', 'February', 'January', 'March', 'May', 'May', 'May'] salary = [109.0, 110.0, 106.0, 107.0, 110.0, 113.0, 128.0, 131.0] salaries = dict(month=month, language=language, office=office, salary = salary) hiring = pd.DataFrame(data=salaries) # grouping the data hiring_gp = hiring.groupby(['language','month'])['salary'].mean() print(hiring_gp)
We now have a Series that contains the grouping results.
# 1 – Convert the Groupby to a DataFrame with to_frame()
The first option to convert the grouped data to a DataFrame is using the Series method to_frame(). Here’s the snippet:
and the result:
# 2 – Turn Series to DataFrame with unstack
We can use the to_frame() method that we just reviewed in order to convert Series objects to DataFrames.
Another useful method to achieve that is to use the unstack method:
And here is the result:
# 3 – Group data directly into a DataFrame with groupby.agg
We can obviously aggregate data directly into a DataFrame, using the groupby.agg method:
hiring.groupby(['language','month']).agg (avg_salary= ('salary', 'mean'))
Groupby DataFrame by all columns (or multiple ones)
Another question we typically get is how to groupby DataFrame data by multiple columns (or even all columns). If you are in search for a solution for that look into this post on multiple column grouping in Pandas.
Pandas groupby DataFrame to csv
We can easily write our data to a comma separated value (CSV) file.
Converting the GroupBy object to Excel
We can easily use the Series to_excel method to convert the grouped data to Excel:
Alternatively we can first create a DataFrame and use a similar to_excel method for DataFrames
import openpyxl hiring_gp.unstack(level='month').to_excel('hiring.xlsx')
You might want to look into a more comprehensive tutorial on how to export Pandas DataFrame to Excel workbooks.
Is it possible to filter groups before converting to a DataFrame?
Yes you are able to use the filter() GroupBy method to apply a function to determine which groups will be converted to a DataFrame. In this example, we will convert only groups with salary qreater than 100.
def filter_groups(month): return salary.mean() > 100 hiring_filtered = hiring_gp.filter(filter_groups) hiring = hiring_filtered .reset_index(drop = True)
Can i assign custom column names when converting a GroupBy?
Yes, use the rename DataFrame method and pass a dictionary as needed.