Find duplicate rows in table with SQL
There are constructs where you shouldn't have duplicate column combinations in a table. Best protection against it, is a combined unique constraint. But depending on when you join the project, it might already be to late and you might already have those duplicates in your table.
It's possible to find them with SQL. In our example we have an event sourcing system with domain messages. Every message has a aggregate_root_id
and an aggregate_root_version
. The combination of them should be unique.
We find the duplicates by using a COUNT
and GROUP BY
. You have to make sure that all columns are included in SELECT
and GROUP BY
:
SELECT COUNT(*), aggregate_root_id, aggregate_root_version
FROM domain_messages
GROUP BY aggregate_root_id, aggregate_root_version
HAVING COUNT(*) > 1;