Watch Video – How to Find and Remove Duplicates in Excel
With a lot of data…comes a lot of duplicate data.
Duplicates in Excel can cause a lot of troubles. Whether you import data from a database, get it from a colleague, or collate it yourself, duplicates data can always creep in. And if the data you are working with is huge, then it becomes really difficult to find and remove these duplicates in Excel.
In this tutorial, I’ll show you how to find and remove duplicates in Excel.
Find and Highlight Duplicates in Excel
Duplicates in Excel can come in many forms. You can have it in a single column or multiple columns. There may also be a duplication of an entire row.
Finding and Highlight Duplicates in a Single Column in Excel
Conditional Formatting makes it simple to highlight duplicates in Excel.
Here is how to do it:
- Select the data in which you want to highlight the duplicates.
- Go to Home –> Conditional Formatting –> Highlight Cell Rules –> Duplicate Values.
- In the Duplicate Values dialog box, select Duplicate in the drop down on the left, and specify the format in which you want to highlight the duplicate values. You can choose from the ready-made format options (in the drop down on the right), or specify your own format.
- This will highlight all the values that have duplicates.
Quick Tip: Remember to check for leading or trailing spaces. For example, “John” and “John ” are considered different as the latter has an extra space character in it. A good idea would be to use the TRIM function to clean your data.
Finding and Highlight Duplicates in Multiple Columns in Excel
If you have data that spans multiple columns and you need to look for duplicates in it, the process is exactly the same as above.
Here is how to do it:
- Select the data.
- Go to Home –> Conditional Formatting –> Highlight Cell Rules –> Duplicate Values.
- In the Duplicate Values dialog box, select Duplicate in the drop down on the left, and specify the format in which you want to highlight the duplicate values.
- This will highlight all the cells that have duplicates value in the selected data set.
Finding and Highlighting Duplicate Rows in Excel
Finding duplicate data and finding duplicate rows of data are 2 different things. Have a look:
Finding duplicate rows is a bit more complex than finding duplicate cells.
Here are the steps:
- In an adjacent column, use the following formula:
=A2&B2&C2&D2
Drag this down for all the rows. This formula combines all the cell values as a single string. (You can also use the CONCATENATE function to combine text strings)
By doing this, we have created a single string for each row. If there are duplicate rows in this dataset, then these strings would be exactly the same for it.
Now that we have the combined strings for each row, we can use conditional formatting to highlight duplicate strings. A highlighted string implies that the row has a duplicate.
Here are the steps to highlight duplicate strings:
- Select the range that has the combined strings (E2:E16 in this example).
- Go to Home –> Conditional Formatting –> Highlight Cell Rules –> Duplicate Values.
- In the Duplicate Values dialog box, make sure Duplicate is selected and then specify the color in which you want to highlight the duplicate values.
This would highlight the duplicate values in column E.
In the above approach, we have highlighted only the strings that we created.
But what if you want to highlight all the duplicate rows (instead of highlighting cells in one single column)?
Here are the steps to highlight duplicate rows:
- In an adjacent column, use the following formula:
=A2&B2&C2&D2
Drag this down for all the rows. This formula combines all the cell values as a single string.
- Select the data A2:D16.
- With the data selected, go to Home –> Conditional Formatting –> New Rule.
- In the ‘New Formatting Rule’ dialog box, click on ‘Use a formula to determine which cells to format’.
- In the field below, use the following COUNTIF function:
=COUNTIF($E$2:$E$16,$E2)>1
- Select the format and click OK.
This formula would highlight all the rows that have a duplicate.
Also read: Remove Duplicates Within a Cell in Excel
Remove Duplicates in Excel
In the above section, we learned how to find and highlight duplicates in excel. In this section, I will show you how to get rid of these duplicates.
Remove Duplicates from a Single Column in Excel
If you have the data in a single column and you want to remove all the duplicates, here are the steps:
- Select the data.
- Go to Data –> Data Tools –> Remove Duplicates.
- In the Remove Duplicates dialog box:
- If your data has headers, make sure the ‘My data has headers’ option is checked.
- Make sure the column is selected (in this case there is only one column).
- Click OK.
This would remove all the duplicate values from the column, and you would have only the unique values.
CAUTION: This alters your data set by removing duplicates. Make sure you have a back-up of the original data set. If you want to extract the unique values at some other location, copy this dataset to that location and then use the above-mentioned steps. Alternatively, you can also use Advanced Filter to extract unique values to some other location.
Remove Duplicates from Multiple Columns in Excel
Suppose you have the data as shown below:
In the above data, row #2 and #16 have the exact same data for Sales Rep, Region, and Amount, but different dates (same is the case with row #10 and #13). This could be an entry error where the same entry has been recorded twice with different dates.
To delete the duplicate row in this case:
- Select the data.
- Go to Data –> Data Tools –> Remove Duplicates.
- In the Remove Duplicates dialog box:
- If your data has headers, make sure the ‘My data has headers’ option is checked.
- Select all the columns except the Date column.
- Click OK.
This would remove the 2 duplicate entries.
NOTE: This keeps the first occurrence and removes all the remaining duplicate occurrences.
Remove Duplicate Rows in Excel
To delete duplicate rows, here are the steps:
- Select the entire data.
- Go to Data –> Data Tools –> Remove Duplicates.
- In the Remove Duplicates dialog box:
- If your data has headers, make sure the ‘My data has headers’ option is checked.
- Select all the columns.
- Click OK.
Use the above-mentioned techniques to clean your data and get rid of duplicates.
12 thoughts on “Find and Remove Duplicates in Excel”
This blog is very neat and helpful with proper details. I came across this when I was trying to figure out a way to find duplicates in excel rows.
Thank you for writing this. Truly appreciate the efforts.
Thank you!
hi,
i have two column to remove duplicate entry, but i wanna remove only one to one, if the same figure exist three time in column 1 and in 2nd column exist two time i wanna remove one to one figure while i compare two column,
3rd one figure should not be delete and cell should not up and down,
Please guide me.
Useful tip with nice article keep sharing articles
I have a question that I can’t find the answer to. How do i find and delete duplicate row data in multiple column data when columns are not unique. For example I have fantasy football lineup combinations and you would have a QB RB RB WR WR WR TE DST K and I want to eliminate duplicate lineups. However, a WR could show up in any of those WR columns and you could have the same group of players show up in multiple rows depending on which column that player got entered in but its really the same lineup.
Is there a way to get around this to remove duplicate rows?
dear sir how ca i insert a column between 1
1
2
3
4
6
8
12
15
concoranate better than index
Nice and neatly explained. Can we find the same with the help of Vlookup or not?
Thanks for commenting Narendra.. Finding duplicates using VLOOKUP may not be the best way to go about it
Hi Sumit,
I have an query that if i want to know that which no. is repeated how many times then which formula should we use.
Like I want to check that every duplicate value is how many times repeated. Please explain.
I like the caption you put on the photo! 🙂
Nicely explained article on the topic. Just one gentle reminder on the use of Concatenate in your example. In normal case, there won’t be an issue but to play safe, I would suggest the use of a delimiter in between. i.e. A2&”|”&B2&”|”&C2&”|”&D2……
Please see my post here for reference:
http://wmfexcel.com/2015/02/14/pay-attention-when-you-concatenate-a1-b1-vs-a1-b1/
Cheers,
MF
Thanks for sharing.. Using CONCATENATE is definitely a better idea. I avoided a delimiter as it works fine without it too, but it surely does improve the readability of the string
Hi Sumit,
My point is not about readability, it’s about robustness of the formula.
There are rare cases that may give you wrong “duplicated rows” without the use of delimiter.