How to build a numpy array from a Python DataFrame?

In today’s tutorial we’ll learn how to quickly convert a DataFrame to an array with Python.

Setup up test data

To complete this tutorial we’ll need both the Pandas and Numpy libraries.

import pandas as pd
import numpy as np
np.random.seed(100)

my_array = np.arange(12).reshape(4,3)

my_array

Note the usage of the Numpy reshape method to define the shape of the matrix.

Let’s now go ahead and quickly create a DataFrame from the array we just generated:

my_df = pd.DataFrame(my_array, columns=['a','b', 'c'])
my_df

Here’s our test DataFrame – note the column names:

abc
0012
1345
2678
391011

Convert DF to np.array with to_numpy

We’ll use the DataFrame method df.to_numpy. Here we go:

my_df.to_numpy(dtype=int)
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Let’s look at the array shape:

my_df.to_numpy(dtype=int).shape

The output as expected will be: (4,3)

DataFrame to array with np.to_records()

my_df.to_records()

The output renders the column index dtypes and names as a one dimensional rec_array.

rec.array([(0, 0,  1,  2), (1, 3,  4,  5), (2, 6,  7,  8), (3, 9, 10, 11)],
          dtype=[('index', '<i8'), ('a', '<i4'), ('b', '<i4'), ('c', '<i4')])

That’s it for today, if you would like to familiarize yourself with working with DataFrames and np.array objects, you can look at our tutorials on adding numpy arrays as rows and columns to Pandas.