In today’s tutorial we’ll learn how to troubleshoot a common plotting error that you might encounter when working in Pandas and trying to plot your data during its pre processing for data analysis.
Reproducing the problem
In order to reproduce the error, we will first construct a very simple DataFrame.
import pandas as pd
# Data provided as lists of strings
month = ['08', '04', '07', '02', '03']
day_hire = ['81','70', '86', '93', '75']
salary = ['120','130', '132', '148', '134']
# Construct the DataFrame
hr = dict(month=month, hiring_interval=day_hire, salary=salary)
hrdf = pd.DataFrame(data=hr)
# Plot your data
hrdf.plot(kind='bar')
Your Python development environment (Jupyter, VSCode, Spyder, PyCharm etc’) will throw the following error message:
Type Error: empty DataFrame: no numeric data to plot
Solving the empty DataFrame: no numeric data to plot error
The root cause of the error is that when constructing the DataFrame we passed a couple of lists consisting of strings. When Pandas tries to parse the data in order to construct the graph it finds no numeric data that can be plotted.
Let’s look carefully into the data types of each column in our DataFrame:
month object hiring_interval object salary object dtype: object
That’s it. We can use the astype method to cast the columns to the appropriate data type so we can easily plot our DataFrame. You can find more about converting DataFrame columns to the integer data type in this tutorial.
hrdf['hiring_interval'] = hrdf['hiring_interval'].astype('int')
hrdf['salary'] = hrdf['salary'].astype('float64')
Alternatively you can pass a dictionary with the required object casting logic:
hrdf.astype({'hiring_interval': 'int', 'salary': 'float64'})
You can also use the pd.to_numeric() function:
hrdf['hiring_interval'] = pd.to_numeric(hrdf['hiring_interval'])
hrdf['salary'] = pd.to_numeric( hrdf['salary'])
Now that the DataFrame includes numeric data, you can draw a nice plot:
hrdf.sort_values(by='month').plot(kind = 'bar', x='month');
And here’s our nice bar chart: