How to create a Python DataFrame from multiple columns in Pandas?

As part of the Data Wrangling process, we often need to slice and subset exisitng Datasets to focus on the most relevant data for our analysis.

In today’s tutorial we’ll show how you can easily use Python to create a new Dataframe from a list of columns of an existing one.

Preparation

We’ll import the Pandas library and create a simple dataset by importing a csv file.

import pandas as pd

# construct a DataFrame
hr = pd.read_csv('hr_data.csv')

'Display the column index
hr.columns

Here are the column labels / names:

Index(['language', 'month', 'salary', 'num_candidates', 'days_to_hire'], dtype='object')

New dataframe from multiple columns list

# define list of column names
cols = ['language', 'num_candidates', 'days_to_hire']

# Create a Df by slicing the source DataFrame
subset = hr[cols]

Let’s verify the type of the created object.

type(subset)

And the result is as expected a DataFrame:

pandas.core.frame.DataFrame

Let us look into the DataFrame values:

subset.head()

DataFrame from multiple column index

In this example we’ll construct a new DataFrame by slicing two columns from our source DataFrame, using the column index values

cols= [hr.columns[0], hr.columns[3]]
subset = hr[cols]
subset.head()

Here’s the result:

Construct DataFrame from Series

In this case, we will use a Series to initialize a new DataFrame.

s = hr['language']
subset = pd.DataFrame(s)
subset.head()

Add a column based on Series

In this example, we will insert a column based on a Pandas Series to an existing DataFrame.

# define new series
s= pd.Series([i for i in range(20)])

#insert new series as column
subset.insert(len(subset.columns), 'new_col',s)

#look into DataFrame column index
subset.columns

Here’s the result:

Index(['language', 'num_candidates', 'new_col'], dtype='object')

Suggested learning