Here’s a typical data analysis question we get from time to time:
I have a spreadsheet with over 100,000 rows Excel spreadsheet that has a bunch of customer information which i need to clean up for duplicates. I would estimate that it will take a me a few days to scrub the list manually. A colleague told me that most probably i will need an Excel Macro to get that done. Our IT guy, who is familiar with Macro development, is on leave so i need to solve this on my own. Is there a way to solve this problem without knowing how to program?
Yes, there is a way to solve this in Excel, and it won’t require to be profficient with Visual Basic programming. Let’s go ahead and solve this real quick for you.
Delete Excel duplicated rows
Here’s how to get rid of the unneeded entries in your spreadsheet:
- First off, make a copy of your workbook, so that you have a backup in case anything goes wrong.
- Next open, your Excel spreadsheet.
- In the upper Ribbon, hit the Data tab.
- Now go ahead and select the data range you would like to clean up. Make sure that you select the table headers if available as well.
- In the Data Tools section hit the Remove Duplicates button.
- Now, go ahead and check the relevant column that will help you determine whether an entry in your list is duplicated. In our case, using our demo Sales Targets data set, we’ll use the first 3 columns to identify duplicated rows.
- Hit OK.
- A message detailing the number of removed entries and the ones kept in the table will be displayed.
- Hit OK and inspect your modified data set for correctness.
- If satisfied with the result, go ahead and save your file. Otherwise you can undo your change (Edit>>Undo).
- Similar capability is available in Excel PowerQuery, which allows you to transform significant larger data sets.
- If you are conversant with the Python language, you can easily drop duplicated rows from a data set using the Pandas library.
Enjoy your data analysis 😉