How to read a .tsv file with Python?

There could be cases in which you’ll be provided data stored in less widespread formats that you’ll need to analyze. In today’s this tutorial we’ll learn how to easily import a tab separated value file (*tsv) with Python and the Pandas library.

How to create a tab delimited file?

You can easily create a tsv file programmatically or using Excel. In Excel, you just need to save your spreadsheet as a Text(Tab delimited) (*.txt) file. Make sure to change the file suffix from .txt to .tsv and then save it into your working directory.

Read tsv files with Python

As we would like to use the tsv file contents in our data analysis, we’ll leverage the pd.read_csv() method that is readily made available as part of the Pandas library.

Here’s the code that we can use (in our case we’ll use our fictionary interview data)

import pandas as pd
interviews_df = pd.read_csv('tsv_file.tsv')

print(interviews_df)

Here is our output – as you can see the /t delimiter wasn’t correctly interpreted by Pandas:

\tmonth\tfirst\tsec
01\tJanuary\t450\t85
12\tApril\t322\t87
23\tDecember\t98\t420
34\tJuly\t84\t108
45\tAugust\t186\t183

The solution is a magic parameter in the pd.read_csv() method named sep=’\t’. This switch will allow Pandas that tabs are used as the delimited in this specific file:

cols = ['month', 'first', 'sec']
interviews_df = pd.read_csv('tsv_file.tsv', sep='\t', usecols = cols)

print(interviews_df)

We used the usecols parameter to pick only specific columns in the data import. Now it looks better:

monthfirstsec
0January45085
1April32287
2December98420
3July84108
4August186183