In this tutorial we’ll learn how to select one or multiple specific columns of a pandas DataFrame by name or index.
Filter pandas column by name
In a nutshell, use the pandas filter method to select one or multiple specific dataframe columns by their name:
your_df.filter(items = [your_col_name], axis = 1)
Initialize a DataFrame
We will first go ahead and create some test data. Feel free to use it to follow along with the different code examples.
dates = pd.date_range(start='10/1/23', end = '10/09/23', freq='B')
expense = [448, 234, 289, 404, 286, 419]
revenue = [2489, 3492, 3053, 3058, 3374, 4500]
# initialize your dataframe
kpis = pd.DataFrame(dict (dates=dates, revenue = revenue, expense = expense))
kpis['margin'] = kpis['revenue'] - kpis['expense']
print(kpis.head(6))
This will return the following pandas DataFrame object:
dates | revenue | expense | margin | |
---|---|---|---|---|
0 | 2023-10-02 | 2489 | 448 | 2041 |
1 | 2023-10-03 | 3492 | 234 | 3258 |
2 | 2023-10-04 | 3053 | 289 | 2764 |
3 | 2023-10-05 | 3058 | 404 | 2654 |
4 | 2023-10-06 | 3374 | 286 | 3088 |
5 | 2023-10-09 | 4500 | 419 | 4081 |
Select a column by name
As mentioned, in order to to filter out a specific column by name we can use the dataframe filter method:
kpis.filter(items = ['revenue'], axis = 1)
Note that we are using axis=1 to indicate that we are interested to search for a specific item in the column index. This returns the following DataFrame:
revenue | |
---|---|
0 | 2489 |
1 | 3492 |
2 | 3053 |
3 | 3058 |
4 | 3374 |
5 | 4500 |
Note that this return a DataFrame object (unlike when selecting a single column using loc, iloc or using the brackets notation) .
Subset multiple columns
If we are interested to get multiple columns, we’ll pass a list to the filter method:
kpis.filter(items = ['revenue', 'margin'], axis = 1)
Filter by column name containing / matching a regex
We can also select one or multiple columns which name matches a regular expression that we can define. Let’s look at two examples:
Column which name end with ‘in’:
kpis.filter(regex = 'in$', axis = 1)
This will return the margin column:
margin | |
---|---|
0 | 2041 |
1 | 3258 |
2 | 2764 |
3 | 2654 |
4 | 3088 |
5 | 4081 |
Columns which header name starts with the letter ‘e’:
kpis.filter(regex = '^e', axis = 1)
This will return the expense column:
expense | |
---|---|
0 | 448 |
1 | 234 |
2 | 289 |
3 | 404 |
4 | 286 |
5 | 419 |
Common errors
When using the filter method to select a specific DataFrame column, make sure to specify the parameter axis =1. Failing to do so, will have pandas looking for a specific item name in your DataFrame row index instead and will probably result in an empty DataFrame.