MySQL去除“關聯表”重復數據,以建立聯合唯一索引
前言
昨天遇到一個問題,需要對一張關系表進行重構和優化。然而這張關系表由于已有代碼沒有注重并發導致了很多的臟數據,即重復數據。
表名thread_recommend,帖子推薦表,為兩個實體user_id和thread_id的(推薦)關系表,表結構很簡單如下:
- /*用戶推薦帖子記錄表*/
- CREATE TABLE `thread_recommend` (
- `id` int(11) NOT NULL AUTO_INCREMENT,
- `thread_id` int(11) DEFAULT NULL COMMENT '被用戶推薦的帖子編號',
- `user_id` int(11) DEFAULT NULL COMMENT '推薦該帖子的用戶編號',
- `status` int(11) DEFAULT '1' COMMENT '狀態0 取消推薦,1推薦',
- `created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '推薦時間',
- PRIMARY KEY (`id`),
- KEY `userid` (`user_id`) USING BTREE
- ) ENGINE=InnoDB;
問題在于,由于代碼不規范,在高并發時(或數據庫壓力大時造成的延時積壓時)會出現多個(相同thread_id和user_id的)組合,如下:
之后你們懂的,各種和原想不一致的神奇bug噴涌而出,比如:
我剛剛取消了推薦,怎么還顯示我推薦著!!
顯示的總推薦數怎么和實際推薦用戶加起來不一樣!!
解決方案一:使用insert where not exists語句
聲明:此方案并不是***方案,不推薦使用。
先上代碼:(這里拿另一個關系表的真實query舉例,原理一樣)
- INSERT INTO `user_topic` (`user_id`, `topic_id`)
- SELECT :userId, :topicid FROM `user_topic`
- WHERE NOT EXISTS (SELECT * FROM `user_topic`
- WHERE `user_topic`.`user_id` = :userId
- AND `user_topic`.`topic_id` = :topicid)
- LIMIT 1;
(相同方法見http://stackoverflow.com/a/31...)
通過這種“插入時判斷不存在才插入并返回行數為1,存在的話返回行數為0”的方法,可以做到:
- 只有在返回行數為1的情況下才執行之后邏輯(如緩存內的統計數+1,緩存內帖子推薦人增加此userId等等)
- 如果返回行數為0,則接口返回error
解決方案二:清理臟數據并建立聯合唯一索引
這個方案是本文的核心了,也是我們目前認為的***實踐。
***步:查找user_id, thread_id的聯合duplication
- SELECT a.* FROM `thread_recommend` a
- INNER JOIN (SELECT * FROM `thread_recommend` GROUP BY `thread_id`, `user_id` HAVING COUNT(id) > 1) b ON a.`thread_id` = b.`thread_id` AND a.`user_id` = b.`user_id`
- ORDER BY a.`user_id` ASC, a.`thread_id` ASC, a.`id` DESC
或簡單的版本
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
得到
哇!所有的重復項都在這里了,好想馬上把它們干掉!
現在需要將重復的條目中ID更大的所有條目都刪除,只留ID最小的那一個。
刪之前先獲得需要刪除項,比對一下,
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
下一步,SELECT * FROM改成DELETE FROM,刪除!
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
Oops!報錯! You can't specify target table 'thread_recommend' for update in FROM clause
這是Mysql的一個小問題,我們參見解決方案 http://stackoverflow.com/a/14... 后修改一下SQL就好:
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM (SELECT * FROM `thread_recommend`) a GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM (SELECT * FROM `thread_recommend`) b GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
***,加聯合唯一索引!
- ALTER TABLE `thread_recommend`
- ADD UNIQUE KEY `thread_id_user_id_unique`(`thread_id`,`user_id`) USING BTREE;
Of course,如果上述清理工作沒有完成將會報錯!
完!