需求:MySQL百万数据的数据表table1中对应每个column_1,每10分钟存储一个数据,需改为每1小时存储一个数据,冗余的删掉,即每6条保留1条数据。
初版存储过程如下:
DELIMITER //
DROP PROCEDURE IF EXISTS p_test //
CREATE PROCEDURE p_test()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE field_1 INT (11);
DECLARE p1 INT(11);
DECLARE cur_list CURSOR FOR SELECT id FROM table1 WHERE column_1 = 'column name';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur_list;
SET p1 = 1;
loop1 : LOOP
FETCH cur_list INTO field_1;
IF done THEN LEAVE loop1;
END IF;
IF p1 < 7 THEN
SET @sql_delete = CONCAT("delete from table1 where id = ", field_1 );
PREPARE sqlli FROM @sql_delete;
EXECUTE sqlli;
COMMIT;
ELSE
SET p1 = 1;
END IF;
SET p1 = p1 + 1;
END LOOP loop1;
CLOSE cur_list;
END //
DELIMITER; //
call p_test;
也没什么特别的,column name直接写死,做了个变量p1,根据p1在循环中的递增情况实现删除数据1-5,保留第6个,p1退回到1,再次循环。
因为column name写死很不方便,想给存储过程传参,传参是字符串String类型,游标取值和之前略有不同:
DELIMITER //
DROP PROCEDURE IF EXISTS p_test //
CREATE PROCEDURE p_test(IN column1 VARCHAR(256))
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE field_1 INT (11);
DECLARE p1 INT(11);
#定义游标
DECLARE cur_list CURSOR FOR (SELECT * from view_1);
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
#采用视图
DROP VIEW IF EXISTS view_1;
#动态sql
SET @strSql = "CREATE VIEW view_1 as SELECT id FROM table1";
IF column1 IS NOT NULL THEN
SET @strSql = CONCAT(@strSql, " WHERE column_1 = '", column1,"'");
END IF;
PREPARE stmt FROM @strSql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
OPEN cur_list;
SET p1 = 1;
loop1 : LOOP
FETCH cur_list INTO field_1;
IF done THEN LEAVE loop1;
END IF;
IF p1 < 7 THEN
SET @sql_delete = CONCAT("delete from table1 where id = ", field_1);
PREPARE sqlli FROM @sql_delete;
EXECUTE sqlli;
COMMIT;
ELSE
SET p1 = 1;
END IF;
SET p1 = p1 + 1;
END LOOP loop1;
CLOSE cur_list;
END //
DELIMITER; //
call p_test("column name");
每次创建存储过程之后,采用 show procedure status; 判断存储过程是否已创建。
...the right syntax to use near NULL at line 1, 出现在调用存储过程的时候报错,传参的读取有可能为NULL,当使用PREPARE的时候会报错,检查CONCAT拼接,完善@strSQL
因数据量过大,考虑在外层做了一个关于column_1的筛选,找出所有数据量超过10000的column name,类似于:
SELECT column_1, count(*) as col_count FROM table1 group by column_1 having col_count > 10000
然后将筛选出的具体column name,调用第一个存储过程进行数据清理,新加的存储过程如下:
delimiter //
DROP PROCEDURE IF EXISTS p_test2//
CREATE PROCEDURE p_test2(IN num INT(11))
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE field_1 VARCHAR(256);
DECLARE field_2 INT(11);
#定义游标
DECLARE cur CURSOR FOR (SELECT * from view2);
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
#采用视图
DROP VIEW IF EXISTS view2;
#动态sql
SET @strSql = "CREATE VIEW view2 as SELECT column_1, count(*) as col_count FROM table1 group by column_1";
IF num IS NOT NULL THEN
SET @strSql = CONCAT(@strSql, " HAVING col_count > ", num);
END IF;
PREPARE stmt FROM @strSql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
OPEN cur;
loop2 : LOOP
FETCH cur INTO field_1, field_2;
IF done THEN LEAVE loop2;
END IF;
call p_test(field_1);
END LOOP loop2;
CLOSE cur;
END //
delimiter; //
call p_test2(10000);
Incorrect number of FETCH variables,出现在游标select字段与fetch into不匹配,改为field_1, field_2之后成功运行
至此,百万数据表中根据column name实现每6条保留1条的数据清理基本实现,实际执行,清理一个万级数据的column name耗时约2分钟