Today we’ll learn how to quickly plot a chart to easily visualize aggregated data.
We’ll first go ahead and create a DataFrame from data that we have aggregated. If you want to follow along this example, you can download the source csv file from this location. The tutorial assumes that you have placed the hr_data.csv file in the same directory containing your Jupyter Notebook or Python script.
Import example data
import pandas as pd
hr = pd.read_csv('hr_data.csv')
Aggregate data using Groupby
We’ll now go ahead and aggregate the data using the Df.groupby() method. We’ll define a new DataFrame to store the aggregated data.
hr_agg = hr.groupby('month') \
.agg(candidates = ('num_candidates', 'sum'), \
avg_salary = ('salary',
'mean')).round()
hr_agg.head()
Here’s the data:
Note: If you are new to Pandas, you might want to look into our tutorial on basic groupby usage.
Drawing a plot with Pandas
We’ll go ahead and render a simple graph, by using the plotting capabilities already included in the Pandas library.
hr_agg.plot(kind='line', title="Candidates and Avg salary by month").legend(bbox_to_anchor= (1.02, 1));
Here’s our multiple line plot:
Note: By default, the chart legend was rendered within the plot area. We used the bbox_to_anchor parameter to ensure that the legend renders in a location that doesn’t overlaps with the line plots.
Groupby plotting with multiple subplots on same figure
If we would like to render several plots, for example in case that we want to make comparisons based on our data we can use the readily available concept for Subplots in Matplotlib.
In order to use Matplotlib plotting capabilities, we’ll first import it into our Namespace.
import matplotlib.pyplot as plt
Now we can use the plt.subplots() capability to render more than one chart simultaneously:
# define the figure and subplots
fig, ax = plt.subplots(2)
ax[0].bar(hr_agg.index,hr_agg['avg_salary'], color='orange')
ax[1].bar(hr_agg.index,hr_agg['candidates'],color='green');
# set chart titles
ax[0].set_title('Average Salary')
ax[1].set_title('Number of Candidates')
# auto-adjust the plots layout
fig.tight_layout()
Here’s the resulting chart:
Note: we used the color parameter to assign a color to the chart bars. We could have assigned one of the predefined matplotlib colormaps and assign those to the chart using the cmap parameter.