There are instances in which would like to remove duplicated items from your Python lists. Here are a few use cases that come to mind:
- Counting the number of elements in the list.
- Comparing different lists.
- Preparing a list to be used in data analysis in pandas and visualization in matplotlib, or further post processing.
- Analyzing the probability distribution of the unique list elements.
In this post, we’ll introduce three different techniques to identify if we have any repeated values in the list and obtain a list of unique values that you can later process as needed.
Option 1: Using a Python Set
The most straightforward way to drop list duplicates is to use a Python set object. Sets are data constructs that are by definition composed only of unique elements.
my_lst = [1,2,3,4,4,5,3,2,1]
print (list( set (my_lst)))
This will return the following list:
[1, 2, 3, 4, 5]
Option 2: Using Pandas
If you are already using the python library for Data Analysis you can use the very handy drop_duplicates Series method to assure uniqueness of your list elements. If pandas is not installed in your Python environment,
import pandas as pd
my_lst = [1,2,3,4,4,5,3,2,1]
my_unique_lst = list (pd.Series (my_lst).drop_duplicates())
Option 3: Using dict.fromkeys
In Python, We use the dict.fromkeys built in function to create a dictionary made of key value pairs from an iterable such a list. If we pass the dictionary to a list constructor, we filter out all duplicated rows.
my_lst = [1,2,3,4,4,5,3,2,1]
my_unique_lst = list(dict.fromkeys(my_lst))
print(my_unique_lst)
This will return our filtered list:
[1, 2, 3, 4, 5]
Option 4: Using a list comprehension and a set
There are several techniques we can use, but filtering out the duplicates using set is the most “pythonic”:
my_unique_lst = [e for e in set(my_lst)]