In today’s quick tutorial we’ll learn how we can subset specific columns from a Pandas DataFrame with Python.
We’ll cover three cases:
- Slicing DataFrames with the loc indexer.
- Selecting columns using the brackets [] notation.
- Sub-setting by position with the iloc indexer
Preparations
We’ll first go ahead and import the Pandas library
import pandas as pd
data = pd.read_csv('languages.csv)
Here’s our data:
language | month | salary | num_candidates | days_to_hire | |
---|---|---|---|---|---|
1 | PHP | July | 136.0 | 78.0 | 54.0 |
2 | PHP | May | 120.0 | 79.0 | 56.0 |
3 | SQL | November | 108.0 | 87.0 | 59.0 |
4 | Scala | March | 119.0 | 83.0 | 37.0 |
1. Slice column by name with the loc[] indexer
Let’s assume that we would like to pick only the month an num_candidates columns. We’ll use the loc indexer and pass the relevant rows and columns labels.
cols= ['month', 'num_candidates']
rows = 1,2,3,4
data.loc[rows,cols]
The output will be:
month | num_candidates | |
---|---|---|
1 | July | 78.0 |
2 | May | 79.0 |
3 | November | 87.0 |
4 | March | 83.0 |
Note that you can adjust the row range to subset only specific rows
# select columns by name/label
cols= ['month', 'num_candidates']
rows = 1,4
data.loc[rows,cols]
More about using the loc indexer to subset DataFrames with multiple conditions in this tutorial.
2. Slicing DataFrames with the brackets notation
This is probably the simple way to slice one or more columns from a DataFrame. You can filter specific rows by creating a more complex filter criteria (we’ll cover that in a different post).
# define list of columns to select
cols= ['month', 'num_candidates']
# apply the brackets notation
data[cols]
Here’s the output:
month | num_candidates | |
---|---|---|
1 | July | 78.0 |
2 | May | 79.0 |
3 | November | 87.0 |
4 | March | 83.0 |
3. Selecting columns with the iloc position indexer
Kindly note that iloc gets two parameters:
- single or list of integer positions of the rows (starting from 0)
- single or list of integer positions of the columns (starting from 0)
# select columns by position/number
rows= [0,3]
cols= [1, 3]
data.iloc[rows,cols]
Here’s the result:
month | num_candidates | |
---|---|---|
1 | July | 78.0 |
4 | March | 83.0 |