Readers of this blog know that we are pretty big on finding ways to automate mundane and boring tasks. Python is great for automation, specially when it pertains to the Data Analysis domain.
Lately, we are increasingly spending more an more time with the Pandas data Analysis library, and using it for data munging and visualization. We rather work in Python instead of dealing with the complexity of VBA in the area of data manipulation.
That said, there are situations in which we choose to feed data from Pandas back into an Excel workbook. This allows us to combine Python and Pandas flexibility and speed with great visualization capabilities and the obvious ubiquity of Excel.
Today we’ll show you how to export data from a Pandas DataFrame to an Excel file (xlsx). We’ll deal with two scenarios:
- Save a Pandas df to an Excel file.
- Exporting Pandas DataFrames to multiple worksheets in a workbook.
Note: This tutorial requires some basic knowledge of Python programming and specifically the Pandas library.
Export and Write Pandas DataFrame to Excel
Here’s the process in a nutshell:
- Ensure that you have loaded the pandas and openpyxl libraries into your environment.
- Initialize/Load your DataFrame
- Use the DataFrame.to_excel method to export your data
Load add-on libraries
Before we start, we’ll need to import a few libraries into Python as shown below. Go ahead and type this Python 3 code into you favorite Python editor.
import pandas as pd import openpyxl import xlsxwriter
Defining our Dataset
Now let’s create the data that we’ll be using in this tutorial
Note: If you want to learn how to aggregate Data in Python you can look into our tutorial on grouping Python data according to one or multiple columns.
Pandas export and output to xls and xlsx file
Now, we would like to export the DataFrame that we just created to an Excel workbook. Pandas has a very handy to_excel method that allows to do exactly that. Let’s use it:
The code will create the languages.xlsx file and export the dataset into Sheet1
Note: In case that you don’t have the openpyxl package installed in your Python environment, you’ll get the following error: ModuleNotFoundError: No module named ‘openpyxl’ . The solution is relatively simple: from your terminal or if you are using Anaconda, from the Anaconda Navigator, type: pip install openpyxl. Then re-run your code again.
If you want to get more fancy you can use the parameters of the to_excel method to customize your data transfer. In our case we set the sheet name, chose not to import the df index picked the columns to export and even define frozen panes in the worksheet.
df.to_excel("languages1.xlsx", sheet_name="Languages", index=False, , freeze_panes=(1,1), columns=["avg_salary", "language"])
Python to Export Pandas DataFrames to multiple sheets
So far so good, but what if we would like to import data into several worksheets? That’s possible as well, although a bit more elaborated.
We’ll first split our data to three different DataFrames:
#we define three lists S1= data["language"] S2= data["avg_salary"] S3= data["applications"] # We then create three dataframes df1=pd.DataFrame(S1, columns=["language"]) df2=pd.DataFrame(S2 , columns=["avg_salary"]) df3=pd.DataFrame(S3, columns=["applications"]) # We then groups the dataframes into a list for more efficient processing dflist= [df1,df2,df3]
Now we’ll import multiple dataframes into Excel (in this case our three dfs to three different worksheets).
# We'll define an Excel writer object and the target file Excelwriter = pd.ExcelWriter("languages_multiple.xlsx",engine="xlsxwriter") #We now loop process the list of dataframes for i, df in enumerate (dflist): df.to_excel(Excelwriter, sheet_name="Sheet" + str(i+1),index=False) #And finally save the file Excelwriter.save()
Voi’la, now go ahead to your directory and look into the languages_multiple.xlsx file. You are done :-).