How to create random pandas DataFrame columns?

Follow the step-by-step process outlined below to create pandas columns populated with random integer, floating and string data.

Step 1: Import required packages

We will first import several libraries into our development environment (JupYter, PyCharm, VSCode, Colab etc’).

Note that the random module is part of the standard Python library – that means that you don’t need to install any add-ons to use it. Conversely; pandas and NumPy are third party packages, which needs to be installed before being used to avoid ‘module not found’ errors.

import random
import pandas as pd
import numpy as np

Step 2: Create random string and integer data

Next step is to actually create the random data.

We will first define a DataFrame size:

n =5

Next, we will create a list containing strings from which we’ll pick randomly:

 
languages_lst = ['Python', 'Java', 'Javascript' , 'R', 'Go', 'Swift']
language = random.choices(languages_lst, k=n  )

Then let’s create two randomly generated NumPy arrays to generate integer values:

num_candidates = np.random.normal(65, 15,n).round(0).astype(int) # 1
days_to_hire= np.random.poisson(32,n).round() # 2

Note: you can easily generate a list of integers also by using the random.randlist() function:

rnd_int_lst = random.sample(range(15,65), n)

Step 3: Create DataFrame with random strings and integers

Next step is to call the DataFrame constructor:


# store the random data in a dictionary
random_dict = dict 
                               (language = language, 
                                days_to_hire = days_to_hire, 
                                num_candidates = num_candidates)
# initialize the dataframe
random_data = pd.DataFrame(random_dict)

Let’s look at our DataFrame:

random_data

Here’s our DataFrame. Note that it contains random data, so your DataFrame will contain different data.

languagedays_to_hirenum_candidates
0Python4256
1Javascript3053
2R2955
3Javascript3161
4Swift3253

Step 4: Add and Fill column with random values from list

Our last topic for today will be to fill a new pandas DataFrame column with values from a random list.

rnd_number_hires = random.sample(range(8,15), n) #1

random_data = random_data.assign(num_hires = rnd_number_hires) #2

Explanation

  • #1 – we create a list of random integers.
  • #2 – we add the list as a column into our existing DataFrame.

Here is the result – note the new num_hires column:

languagedays_to_hirenum_candidatesnum_hires
0Javascript42659
1R308812
2R276514
3Python315110
4Javascript216511