This is a blog post for future reference and this has proved useful to me over the past year ever (ever since I found it online). And it's time to transfer the method from email to blog.

CTE is to me (from what little I know of advanced SQL concepts) a pretty powerful feature. It stands for Common Table Expression and is basically a temporary result set (table) of data that you can use within the same query execution.

I do this via a two step process, first query verifies that the CTE is generated properly, second one uses the CTE to delete the Duplicates.

Quite frankly, I still have no idea how exactly this works, I just know it has worked for me in the past. If any SQL Guru who comes upon this post wants to explain how this exactly works in the comments section it would be very appreciated.

view plain print about
1/* Build CTE and select records to verify everything looks good */
2WITH CTE AS
3(
4SELECT ID, RANK() OVER (PARTITION BY ID ORDER BY num) rnk
5FROM (SELECT *, ROW_NUMBER() OVER(ORDER BY name) num
6FROM ABC)
7X)
8SELECT * FROM CTE
9WHERE rnk >
1
10
11/* After verification, build CTE again and now Delete the duplicates */
12WITH CTE AS
13(
14SELECT ID, RANK() OVER (PARTITION BY ID ORDER BY num) rnk
15FROM (SELECT *, ROW_NUMBER() OVER(ORDER BY name) num
16FROM ABC)
17X)
18DELETE FROM CTE
19WHERE rnk > 1

The first query builds the CTE and we first verify that the built CTE is good, expected duplicates show up. The second query then deletes the duplicates.

In both queries, you have to replace the "ID" and "name" parts of the query based on your table. And you might now always know what exactly to use here, well then experiment. I usually start with using the duplicate column in the "name" part.

This has been of use particularly in cases where the tables have duplicate data but there is no PK (or a good PK) available to drive the deletion. And maybe there is an easier way too now.