How to select two columns or more in a Pandas DataFrame?

In today’s post we’ll learn to slice and subset multiple columns from a Pandas dataframe. We’ll look into three methods for accomplishing our task:

  • Slicing columns using the “brackets notation”
  • Using the loc indexer
  • Using the iloc indexer

Data Preparation

Go ahead and import the pandas library into your Python development environment and create your DataFrame.

import pandas as pd

'read the data from a csv file
data = pd.read_csv('hr.csv')

'Preview the DataFrame
print(data)

Here’s our DataFrame:

languageavg_salarycandidates
1C#85.082.0
2Java72.078.0
3JavaScript81.078.0
4Ruby82.087.0

Select multiple columns in a Python DataFrame

Using the brackets notation:

When using this technique we’ll subset the DataFrame using a list containing two or more column labels.

cols = ['language', 'candidates']
data[cols]

Here’s the result:

languagecandidates
1C#82.0
2Java78.0
3JavaScript78.0
4Ruby87.0

Using the loc indexer:

We can achieve the same result by passing the column labels as a parameter to the loc indexer. Here’s the code:


cols = ['language', 'candidates']
data.loc[:,cols]

Using the iloc indexer

We use the iloc indexer to slice one or several distinct columns ranges out of a DataFrame by index.

First let’s take a look at the DF columns index.

data.columns

Here’s our DataFrame index:

Index(['language', 'avg_salary', 'candidates'], dtype='object')

We are interested in the first two columns, so we’ll slice the index accordingly:

data.iloc[:,0:2]

# alternatively, we can use the following code:

data.iloc[:,range(2)]

Select Python columns by condition

You might want to subset your data according to specific logic related to the column labels or index.

Here’s a quick example that allows to achieve the same data subset we seen before:

filt = data.columns.str.contains('language')  | data.columns.str.contains('avg')

#pass the boolean filter to a loc indexer
data.loc[:,filt]

Drop multiple columns

A somewhat related topic is how to delete specific columns from your DataFrame. We have obviously covered it in a previous tutorial. You can find it here.

Additional learning: