Find distinct values in pandas DataFrame columns

As part of your data wrangling process you might need to identify unique values in your dataset columns. In this tutorial we’ll learn how to use several useful pandas Series methods that help to search for distinct values in an entire column in our DataFrame.

Example Data

We will start by importing the pandas library and creating a very simple DataFrame that you can use in order to follow along.

import pandas as pd

language = ['Javascript', 'Python', 'Java', 'Java', 'Javascript']
office = ['Bangkok', 'Toronto', 'Los Angeles', 'Bangkok', None]
salary = [134.0, 127.0, 194.0, 123.0, 145.0]
data = dict(language = language, office = office, salary = salary)
hrdf = pd.DataFrame(data=data)

Different values in pandas DataFrame column

There are several ways to identify the distinct values in a column / pandas Series:

Using drop_duplicates()

We can invoke the drop_duplicates Series method and identify the unique values:

unique_lang_s = hrdf.language.drop_duplicates()

This returns a pandas series:

0    Javascript
1        Python
2          Java
Name: language, dtype: object

Using unique()

We can also use the Series unique() method.

unique_lang_arry = hrdf.language.unique()
print (unique_lang_arry)

This returns a numpy array:

['Javascript' 'Python' 'Java']

Print distinct column values as a list

We can easily export column values to a Python:

unique_languages_s = hrdf.language.drop_duplicates()
lang_lst = unique_languages_s.to_list()

Get unique column values in sorted order


The Series is sorted in ascending alphabetical order:

2          Java
0    Javascript
1        Python
Name: language, dtype: object

Find unique column elements with nan / None values

Columns in our DataFrame might contain empty values (NA, nan, None and as such). We can use the drop,na() Series method to get rid of empty values.

In our example, the office contains one empty value.


The result is:

['Bangkok' 'Toronto' 'Los Angeles' None]

We can easily filter out the empty / nan values:


Doing so will return the following array:

['Bangkok' 'Toronto' 'Los Angeles']

Count number of uniques in a Series


The result will be: 3

Count frequency of distinct values in a column

In the last section of this tutorial, we would like to count the number of occurrences of each separate value. We will use the value_counts() Series method.


We’ll get the following result:

Javascript    2
Java          2
Python        1
Name: language, dtype: int64

Related learning

How to find one or multiple specific values in a pandas column?