There could be cases in which you’ll be provided data stored in less widespread formats that you’ll need to analyze. In today’s this tutorial we’ll learn how to easily import a tab separated value file (*tsv) with Python and the Pandas library.
How to create a tab delimited file?
You can easily create a tsv file programmatically or using Excel. In Excel, you just need to save your spreadsheet as a Text(Tab delimited) – (*.txt) file. Make sure to change the file suffix from .txt to .tsv and then save it into your working directory.
Read tsv files with Python
As we would like to use the tsv file contents in our data analysis, we’ll leverage the pd.read_csv() method that is readily made available as part of the Pandas library.
Here’s the code that we can use (in our case we’ll use our fictionary interview data)
import pandas as pd interviews_df = pd.read_csv('tsv_file.tsv') print(interviews_df)
Here is our output – as you can see the /t delimiter wasn’t correctly interpreted by Pandas:
The solution is a magic parameter in the pd.read_csv() method named sep=’\t’. This switch will allow Pandas that tabs are used as the delimited in this specific file:
cols = ['month', 'first', 'sec'] interviews_df = pd.read_csv('tsv_file.tsv', sep='\t', usecols = cols) print(interviews_df)
We used the usecols parameter to pick only specific columns in the data import. Now it looks better: