Follow the step-by-step process outlined below to create pandas columns populated with random integer, floating and string data.
Step 1: Import required packages
We will first import several libraries into our development environment (JupYter, PyCharm, VSCode, Colab etc’).
Note that the random module is part of the standard Python library – that means that you don’t need to install any add-ons to use it. Conversely; pandas and NumPy are third party packages, which needs to be installed before being used to avoid ‘module not found’ errors.
import random
import pandas as pd
import numpy as np
Step 2: Create random string and integer data
Next step is to actually create the random data.
We will first define a DataFrame size:
n =5
Next, we will create a list containing strings from which we’ll pick randomly:
languages_lst = ['Python', 'Java', 'Javascript' , 'R', 'Go', 'Swift']
language = random.choices(languages_lst, k=n )
Then let’s create two randomly generated NumPy arrays to generate integer values:
num_candidates = np.random.normal(65, 15,n).round(0).astype(int) # 1
days_to_hire= np.random.poisson(32,n).round() # 2
Note: you can easily generate a list of integers also by using the random.randlist() function:
rnd_int_lst = random.sample(range(15,65), n)
Step 3: Create DataFrame with random strings and integers
Next step is to call the DataFrame constructor:
# store the random data in a dictionary
random_dict = dict
(language = language,
days_to_hire = days_to_hire,
num_candidates = num_candidates)
# initialize the dataframe
random_data = pd.DataFrame(random_dict)
Let’s look at our DataFrame:
random_data
Here’s our DataFrame. Note that it contains random data, so your DataFrame will contain different data.
language | days_to_hire | num_candidates | |
---|---|---|---|
0 | Python | 42 | 56 |
1 | Javascript | 30 | 53 |
2 | R | 29 | 55 |
3 | Javascript | 31 | 61 |
4 | Swift | 32 | 53 |
Step 4: Add and Fill column with random values from list
Our last topic for today will be to fill a new pandas DataFrame column with values from a random list.
rnd_number_hires = random.sample(range(8,15), n) #1
random_data = random_data.assign(num_hires = rnd_number_hires) #2
Explanation
- #1 – we create a list of random integers.
- #2 – we add the list as a column into our existing DataFrame.
Here is the result – note the new num_hires column:
language | days_to_hire | num_candidates | num_hires | |
---|---|---|---|---|
0 | Javascript | 42 | 65 | 9 |
1 | R | 30 | 88 | 12 |
2 | R | 27 | 65 | 14 |
3 | Python | 31 | 51 | 10 |
4 | Javascript | 21 | 65 | 11 |