How to slice and select DataFrame columns in Python?

In today’s quick tutorial we’ll learn how we can subset specific columns from a Pandas DataFrame with Python.

We’ll cover three cases:

  1. Slicing DataFrames with the loc indexer.
  2. Selecting columns using the brackets [] notation.
  3. Sub-setting by position with the iloc indexer

Preparations

We’ll first go ahead and import the Pandas library

import pandas as pd

data = pd.read_csv('languages.csv)

Here’s our data:

languagemonthsalarynum_candidatesdays_to_hire
1PHPJuly136.078.054.0
2PHPMay120.079.056.0
3SQLNovember108.087.059.0
4ScalaMarch119.083.037.0

1. Slice column by name with the loc[] indexer

Let’s assume that we would like to pick only the month an num_candidates columns. We’ll use the loc indexer and pass the relevant rows and columns labels.

cols= ['month', 'num_candidates']
rows = 1,2,3,4
data.loc[rows,cols]

The output will be:

monthnum_candidates
1July78.0
2May79.0
3November87.0
4March83.0

Note that you can adjust the row range to subset only specific rows

# select columns by name/label
cols= ['month', 'num_candidates']
rows = 1,4
data.loc[rows,cols]

2. Slicing DataFrames with the brackets notation

This is probably the simple way to slice one or more columns from a DataFrame. You can filter specific rows by creating a more complex filter criteria (we’ll cover that in a different post).

# define list of columns to select
cols= ['month', 'num_candidates']

# apply the brackets notation
data[cols]

Here’s the output:

monthnum_candidates
1July78.0
2May79.0
3November87.0
4March83.0

3. Selecting columns with the iloc position indexer

Kindly note that iloc gets two parameters:

  • single or list of integer positions of the rows (starting from 0)
  • single or list of integer positions of the columns (starting from 0)
# select columns by position/number
rows= [0,3]
cols= [1, 3]
data.iloc[rows,cols]

Here’s the result:

monthnum_candidates
1July78.0
4March83.0