In today’s post we’ll learn to slice and subset multiple columns from a Pandas dataframe. We’ll look into three methods for accomplishing our task:
- Slicing columns using the “brackets notation”
- Using the loc indexer
- Using the iloc indexer
Data Preparation
Go ahead and import the pandas library into your Python development environment and create your DataFrame.
import pandas as pd
'read the data from a csv file
data = pd.read_csv('hr.csv')
'Preview the DataFrame
print(data)
Here’s our DataFrame:
language | avg_salary | candidates | |
---|---|---|---|
1 | C# | 85.0 | 82.0 |
2 | Java | 72.0 | 78.0 |
3 | JavaScript | 81.0 | 78.0 |
4 | Ruby | 82.0 | 87.0 |
Select multiple columns in a Python DataFrame
Using the brackets notation:
When using this technique we’ll subset the DataFrame using a list containing two or more column labels.
cols = ['language', 'candidates']
data[cols]
Here’s the result:
language | candidates | |
---|---|---|
1 | C# | 82.0 |
2 | Java | 78.0 |
3 | JavaScript | 78.0 |
4 | Ruby | 87.0 |
Using the loc indexer:
We can achieve the same result by passing the column labels as a parameter to the loc indexer. Here’s the code:
cols = ['language', 'candidates']
data.loc[:,cols]
Using the iloc indexer
We use the iloc indexer to slice one or several distinct columns ranges out of a DataFrame by index.
First let’s take a look at the DF columns index.
data.columns
Here’s our DataFrame index:
Index(['language', 'avg_salary', 'candidates'], dtype='object')
We are interested in the first two columns, so we’ll slice the index accordingly:
data.iloc[:,0:2]
# alternatively, we can use the following code:
data.iloc[:,range(2)]
Select Python columns by condition
You might want to subset your data according to specific logic related to the column labels or index.
Here’s a quick example that allows to achieve the same data subset we seen before:
filt = data.columns.str.contains('language') | data.columns.str.contains('avg')
#pass the boolean filter to a loc indexer
data.loc[:,filt]
Drop multiple columns
A somewhat related topic is how to delete specific columns from your DataFrame. We have obviously covered it in a previous tutorial. You can find it here.
Additional learning: