How to load a csv file into a pandas DataFrame?

Importing and loading comma separated value (csv)files is perhaps one of the most prevalent tasks that you will be executing as a data analyst.

The pandas data analysis library offers the very handy read_csv function which helps us to customize the data load process. In this quick tutorial we’ll learn how to use read_csv and its most useful parameters to tackle some basic data import tasks.

Example data

In this example i will be using the following data. We’ll assume the file path is: ‘hiring_data.csv’

Import the csv data into a DataFrame

We’ll first off load the data as is:

interviews = pd.read_csv('hiring_data.csv')

This results in a pandas DataFrame containing all rows of our csv file.

Loading data without the index

The pandas read_csv method allows us to select which of the csv columns to load using the usecols parameter. Let’s assume that we would like to skip loading the two leftmost columns:

interviews = pd.read_csv('hiring_data.csv', usecols = [2,3,4,5])

print(interviews)

This will result in the following DataFrame:

Note: You might have several unnamed columns in your csv file. Here’s a simple method to remove columns without name from your DataFrame.

Import csv without the header row

When loading the data you can skip the header row by using the header=None and skiprows=1 parameters. Note that if you skip importing the header, you’ll need to use the names parameter to specify the column names for your DataFrame.

interviews = pd.read_csv('hiring_data.csv', usecols = [2,3,4,5] , skiprows = 1,header=None,names = ['Office', 'Language', 'Salary', 'Date'])

print(interviews)

This will result in the following:

Handling date times columns during csv import

The rightmost column in our DataFrame contains dates that are actually strings (object data type in pandas). If we want to convert the column to datetime during import, we use the parse_dates parameter.

interviews = pd.read_csv('hiring_data.csv', usecols = [2,3,4,5], skiprows = 1, header=None, names = ['Office', 'Language', 'Salary', 'Date'], parse_dates = ['Date'])

Note: we can obviously cast any pandas column to datetime using the astype method.

Load a csv data from a url

Another common case that we are adding for completeness here, is when your csv is available as a web url. In order to handle such a case, we’ll simply pass the url as a parameter:

df = pd.read_csv('https://mydata.com/hiring_data.csv')