In today’s quick tutorial we’ll learn how to initialize Python Pandas DataFrames from scratch.
We;ll be focusing on several prevalent use cases that you might want to get familiar with as they’ll be very useful in your Data preparation process.
- New dataframe with column names
- Setting the size of the empty DataFrame
- Create dataframe with index
- Append data to the new DataFrame
- Create empty column
Preparation
Let’s get started by importing the Pandas library:
import pandas as pd
Note: If Pandas is not properly installed in your system, you will receive a modulenotfound error. If that’s the case you might need to install Pandas in your system first.
Now let’s define some data that we’ll use through the tutorial:
df_cols = ['city', 'month' , 'year', 'min_temp', 'max_temp']
1. Empty DataFrame with column names
Let’s first go ahead and add a DataFrame from scratch with the predefined columns we introduced in the preparatory step:
#with column names
new_df = pd.DataFrame(columns=df_cols)
We can now easily validate that the DF is indeed empty using the relevant attribute:
new_df.empty
2. Make a DF with specific size
num_rows = 5
new_df = pd.DataFrame(index=range(num_cols), columns = df_cols)
new_df
3. Save new DataFrame with index
In the snippet below we’ll define an index for the DataFrame and pass it to the pd.DataFRame constructor.
idx = ['station_id']
new_df = pd.DataFrame(index=idx, columns = df_cols)
4. Append Data to your DataFrame
Next we’ll append data. We are able to easily import data from a csv, json, text etc’. For the sake of simplicity, we’ll import a list as a row to the DataFrame:
new_row =['NYC', 12, 2022, 19, 65]
new_df =pd.DataFrame(columns = df_cols)
# using the loc indexer
new_df.loc[0] = new_row
5. New empty columns
We’ll wrap up this tutorial showing how to create a n empty colum into your DF:
import numpy as np
new_df['empty_col'] = np.nan