How to filter and select columns by name in Pandas?

In this tutorial we’ll learn how to select one or multiple specific columns of a pandas DataFrame by name or index.

Filter pandas column by name

In a nutshell, use the pandas filter method to select one or multiple specific dataframe columns by their name:

your_df.filter(items = [your_col_name], axis = 1)

Initialize a DataFrame

We will first go ahead and create some test data. Feel free to use it to follow along with the different code examples.

dates = pd.date_range(start='10/1/23', end = '10/09/23', freq='B')
expense = [448, 234, 289, 404, 286, 419]
revenue = [2489, 3492, 3053, 3058, 3374, 4500]

# initialize your dataframe
kpis = pd.DataFrame(dict (dates=dates, revenue = revenue, expense = expense))
kpis['margin'] = kpis['revenue'] - kpis['expense']

print(kpis.head(6))

This will return the following pandas DataFrame object:

datesrevenueexpensemargin
02023-10-0224894482041
12023-10-0334922343258
22023-10-0430532892764
32023-10-0530584042654
42023-10-0633742863088
52023-10-0945004194081

Select a column by name

As mentioned, in order to to filter out a specific column by name we can use the dataframe filter method:

kpis.filter(items = ['revenue'], axis = 1)

Note that we are using axis=1 to indicate that we are interested to search for a specific item in the column index. This returns the following DataFrame:

revenue
02489
13492
23053
33058
43374
54500

Note that this return a DataFrame object (unlike when selecting a single column using loc, iloc or using the brackets notation) .

Subset multiple columns

If we are interested to get multiple columns, we’ll pass a list to the filter method:

kpis.filter(items = ['revenue', 'margin'], axis = 1)

Filter by column name containing / matching a regex

We can also select one or multiple columns which name matches a regular expression that we can define. Let’s look at two examples:

Column which name end with ‘in’:

kpis.filter(regex = 'in$', axis = 1)

This will return the margin column:

margin
02041
13258
22764
32654
43088
54081

Columns which header name starts with the letter ‘e’:

kpis.filter(regex = '^e', axis = 1)

This will return the expense column:

expense
0448
1234
2289
3404
4286
5419

Common errors

When using the filter method to select a specific DataFrame column, make sure to specify the parameter axis =1. Failing to do so, will have pandas looking for a specific item name in your DataFrame row index instead and will probably result in an empty DataFrame.

Additional learning: