Create a pandas DataFrame from a json file

Often times, we will be pulling data from Application Programming Interfaces (API). One of the most prevalent formats for data exchange using APIs is JSON. In this tutorial we will learn how to get data from a JSON object / string into a pandas DataFrame.

Create Example data

Let’s assume we have a json file with these contents. Using a simple text editor such as Notepad or Notepad++, you can copy the file contents into a text file, just ensure save the file as a .json file type.

[{
	"area": "Python",
	"skill": "Data Analyst",
	"salary": 165
},
{
	"area": "R",
	"skill": "Data Engineer",
	"salary": 155
},

{
	"area": "JavaScript",
	"skill": "Front End",
	"salary": 133
}

]

Create a pandas DataFrame from the json list

We can use the pd.read_json() function to construct a pandas DataFrame from a json array. Before we start, we need to import the pandas library (here is a tutorial to help fix pandas import errors).


import pandas as pd
from pathlib import Path

#path to your json file, could also be a URL
file_path = Path('C:\\Temp\\lang_data.json')

# construct your DataFrame
if file_path.is_file():
    hr_df = pd.read_json(file_path)
    print (hr_df.head())
else:
    print("Could not find json file")

This will render the following output:

areaskillsalary
0PythonData Analyst165
1RData Engineer155
2JavaScriptFront End133

Can’t convert json to DataFrame

If pandas has some difficulty invoking the pd.read_json() function it will render a somewhat obscure ValueError messages.

# ValueError: Unexpected character found when decoding array value 2

or

ValueError: Expected object or value

To solve proceed as following:

  1. Carefully review and validate the formatting of your json file. A simple missing comma or semi-colon might be the crux of the issue.
  2. Second option, ensure that your json file isn’t encoded. If so, use the encoding parameter and assign to it the correct encoding value such as UTF-8, UTF-16 etc.