In today’s tutorial we’ll learn how to persist the content of Pandas dataframe to different file formats, so that we can quickly back them up or access their content as needed in our Data Wrangling workflow. So without further ado let’s look into the best ways to save Python dataframes.
Preparation
We’ll first create our Pandas DataFrame by importing a csv file:
import pandas as pd
survey = pd.read_csv('survey_data.csv)
Here’s our DataFrame: (Note that you can copy its content to your clipboard and create your DataFrame using the df.read_clipboard() method.
language | month | salary | |
---|---|---|---|
1 | Swift | December | 130.0 |
2 | Haskell | November | 124.0 |
3 | Haskell | June | 107.0 |
4 | SQL | December | 111.0 |
5 | Python | August | 128.0 |
Save Pandas DF to csv file format
Probably the file format which you’ll be using the most when wrangling. You can very easily export your Python DataFrame to the comma separated value format.
Here’s the snippet:
survey.to_csv('hr_survey.csv')
Note: If needed, you can store the file using the tab delimited format:
survey.to_csv('hr_survey.tsv',sep='\t' )
Export your DataFrame to Excel
In a similar fashion you can save your data to an Excel spreadsheet. We already covered how you can export multiple dataframes to multiple sheets in Excel.
# before going on you'll need the xlrd or the openpyxl modules
import xlrd
survey.to_excel('hr_survey.xlsx', sheet_name= 'survey')
Note: If you don’t have openpyxl or xlrd, you’ll probably get a module not found error. Take a look here to see how to troubleshoot using the Pyhton PIP utility.
Save Pandas DataFrame to Pickle
The last option we’ll cover today is to serialize data to a file (sometime referred as to pickle). In our example, we’ll use the zip format to stored the object we serialized. We can later on retrieve the file as needed.
# serialize your DataFrame data
survey.to_pickle('hr_survey.zip')
# when needed, retrieve your data into a new DataFrame.
hr = pd.read_pickle('hr_survey.zip')