Find Duplicate Records In Table Without Unique Id
I recently came across a couple methods for listing out instances of duplicate records in a table where the table doesn't have an explicit unique identifier. Here is a post that explains how to do this when a unique identifier is present.
If the table doesn't have an explicit primary key or other uniquely identifying value, then we'll have to get some help from PostgreSQL's internal system columns — namely the ctid
.
The ctid
is:
The physical location of the row version within its table.
Let's use the example of the mailing_list
table with potential duplicate email
values.
Here is the first approach:
This uses a subquery to find the first occurrence of every unique email and then deletes the rest. The ctid
is the unique value that we can call the min
aggregate on.
This uses delete using
to join the table against itself as a cartesian product to compare every entry to every other entry.
Last updated