after i get 100 Million of url’s, i saw a few of duplicates on my mysql table. I used the command line because the MySQL Workbench was not usable with minimum system resources.
For login on Terminal on Mac OS X with MySQL Community Server version: 8.0.29 – GPL
mysql -u root -p
after this command, u need to log in with your password. Don’t forget to change the current database.
use tablename;
Best-Case duplicate removing via temporary table:
create table tmp like urls;
alter table tmp add unique (url);
insert into tmp select * from tablename.urls on duplicate key update domainname=ifnull(urls.domainname, values(domainname));
example table for used command:
id | url | domainname
1 | google.de | google
2 | yahoo.com | yahoo
3 | delalo.de | delalo
I get following response for my query with deleted 36.271 duplicate in 40 minutes.
Query OK, 77017135 rows affected, 1 warning (40 min 23,60 sec)
Records: 96424261 Duplicates: 36271 Warnings: 1
example search queries for search engines:
how to remove duplicate rows in mysql with using temporary table
Query for removing duplicates rows in temp table