MySQL Handling Duplicates

Sometimes tables contain duplicate records, and most of the time, it is allowed to have duplicates, whereas sometimes, it is required to stop duplicate records.

Today in this article, we will discuss some of the ways we can handle duplicates that occur in a MySQL table.

1. Prevent Duplicates using Primary Key

We can use the PRIMARY KEY index on a table with the respective fields to end duplicates.

Let us create a table that contains the first_name and last_name.

CREATE TABLE people (
   first_name CHAR(20),
   last_name CHAR(20)
   );

Here this table contains no such index such as PRIMARY KEY.
To prevent the duplication of the first_name and last_name in the table 'people,' we will use the PRIMARY KEY to its definition during creation.
This will prevent duplication and ensures unique values, and it is necessary to declare NOT NULL as the primary key does not allow NULL values.

CREATE TABLE people (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   PRIMARY KEY (last_name, first_name)
);

INSERT IGNORE:

INSERT IGNORE is used in the case of the same names of people we want to insert into our record.
Since PRIMARY KEY is used for both first_name and last_name, duplicates are not allowed.
Using INSERT IGNORE, MySQL discards any error thrown by it and registers the names into the 'people' table.

INSERT IGNORE INTO people (last_name, first_name)
VALUES( 'Ten', 'Ben');

INSERT IGNORE INTO people (last_name, first_name)
VALUES( 'Stacy', 'Gwen');

REPLACE:

Using the REPLACE command instead of INSERT, inserts the requested records into the table if the record is new.
If it is a duplicate, the new record replaces the old one present in the record.

REPLACE INTO people (last_name, first_name)
VALUES( 'Uzu', 'Naruto');

REPLACE INTO people (last_name, first_name)
VALUES( 'Uchiha', 'Sasuke');

UNIQUE:

Another way to handle duplicates is using the index UNIQUE while creating tables.

CREATE TABLE people (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL
   UNIQUE (last_name, first_name)
);

DISTINCT:

Using DISTINCT, we can get the unique record out of the table without repeating duplicates.

SELECT DISTINCT last_name, first_name
FROM people
ORDER BY last_name;

GROUP BY:

Alternative to DISTINCT is the GROUP BY clause that affects removing duplicates and selecting the unique combinations of values.

SELECT last_name, first_name
FROM people
GROUP BY (last_name, first_name);

2. Duplicate Removal Using Table Replacement

If a table is infested with duplicates and wanna remove that duplicate records altogether, then follow the procedure below.

CREATE TABLE people_tmp SELECT last_name, first_name
FROM people;
GROUP BY (last_name, first_name);

DROP TABLE people;
ALTER TABLE people_tmp RENAME TO people;

Using PRIMARY KEY In Available Table:

An easy way to remove the index in the future and prevent that is by using the primary key to that table even if it is already available.

ALTER IGNORE TABLE people
ADD PRIMARY KEY (last_name, first_name);