MySQL存储过程删除历史数据

缑赤岩
2023-12-01

需求:MySQL百万数据的数据表table1中对应每个column_1,每10分钟存储一个数据,需改为每1小时存储一个数据,冗余的删掉,即每6条保留1条数据。

初版存储过程如下:

DELIMITER //
DROP PROCEDURE IF EXISTS p_test //
CREATE PROCEDURE p_test() 
BEGIN
	DECLARE done INT DEFAULT FALSE;
	DECLARE field_1 INT (11);
	DECLARE p1 INT(11);
	DECLARE cur_list CURSOR FOR SELECT id FROM table1 WHERE column_1 = 'column name';
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;	
	OPEN cur_list;
	SET p1 = 1;	
	loop1 : LOOP
		FETCH cur_list INTO field_1;
		IF done THEN LEAVE loop1; 
		END IF;
		IF p1 < 7 THEN 
			SET @sql_delete = CONCAT("delete from table1 where id = ", field_1 );
			PREPARE sqlli FROM @sql_delete;
			EXECUTE sqlli;
			COMMIT;
		ELSE
			SET p1 = 1;
		END IF;
		SET p1 = p1 + 1;
	END LOOP loop1;
	CLOSE cur_list;
END //
DELIMITER; //

call p_test;

也没什么特别的,column name直接写死,做了个变量p1,根据p1在循环中的递增情况实现删除数据1-5,保留第6个,p1退回到1,再次循环。

因为column name写死很不方便,想给存储过程传参,传参是字符串String类型,游标取值和之前略有不同:

DELIMITER //
DROP PROCEDURE IF EXISTS p_test //
CREATE PROCEDURE p_test(IN column1 VARCHAR(256)) 
BEGIN
	DECLARE done INT DEFAULT FALSE;
	DECLARE field_1 INT (11);
	DECLARE p1 INT(11);
	
#定义游标	
	DECLARE cur_list CURSOR FOR (SELECT * from view_1);
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
	
#采用视图
	DROP VIEW IF EXISTS view_1;
#动态sql
	SET @strSql = "CREATE VIEW view_1 as SELECT id FROM table1";
	IF column1 IS NOT NULL THEN
		SET @strSql = CONCAT(@strSql, " WHERE column_1 = '", column1,"'");
  END IF;
	PREPARE stmt FROM @strSql;  
  EXECUTE stmt;  
  DEALLOCATE PREPARE stmt; 
	
	OPEN cur_list;
	SET p1 = 1;	
	loop1 : LOOP
		FETCH cur_list INTO field_1;
		IF done THEN LEAVE loop1; 
		END IF;
		IF p1 < 7 THEN 
			SET @sql_delete = CONCAT("delete from table1 where id = ", field_1);
			PREPARE sqlli FROM @sql_delete;
			EXECUTE sqlli;
			COMMIT;
		ELSE
			SET p1 = 1;
		END IF;
		SET p1 = p1 + 1;
	END LOOP loop1;
	CLOSE cur_list;
END //
DELIMITER; //
call p_test("column name");

每次创建存储过程之后,采用 show procedure status; 判断存储过程是否已创建。

...the right syntax to use near NULL at line 1, 出现在调用存储过程的时候报错,传参的读取有可能为NULL,当使用PREPARE的时候会报错,检查CONCAT拼接,完善@strSQL

因数据量过大,考虑在外层做了一个关于column_1的筛选,找出所有数据量超过10000的column name,类似于:

SELECT column_1, count(*) as col_count FROM table1 group by column_1 having col_count > 10000

然后将筛选出的具体column name,调用第一个存储过程进行数据清理,新加的存储过程如下:

delimiter //
DROP PROCEDURE IF EXISTS p_test2//
CREATE PROCEDURE p_test2(IN num INT(11)) 
BEGIN
	DECLARE done INT DEFAULT FALSE;
	DECLARE field_1 VARCHAR(256);
	DECLARE field_2 INT(11);
#定义游标	
	DECLARE cur CURSOR FOR (SELECT * from view2);
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
#采用视图
	DROP VIEW IF EXISTS view2;
#动态sql
	SET @strSql = "CREATE VIEW view2 as SELECT column_1, count(*) as col_count FROM table1 group by column_1";
	IF num IS NOT NULL THEN
		SET @strSql = CONCAT(@strSql, " HAVING col_count > ", num);
  END IF;
	PREPARE stmt FROM @strSql;  
  EXECUTE stmt;  
  DEALLOCATE PREPARE stmt; 
	
	OPEN cur;
	loop2 : LOOP
		FETCH cur INTO field_1, field_2;
		IF done THEN LEAVE loop2; 
		END IF;
		call p_test(field_1);
	END LOOP loop2;
	CLOSE cur;
END //
delimiter; //
call p_test2(10000);

Incorrect number of FETCH variables,出现在游标select字段与fetch into不匹配,改为field_1, field_2之后成功运行

至此,百万数据表中根据column name实现每6条保留1条的数据清理基本实现,实际执行,清理一个万级数据的column name耗时约2分钟

 类似资料: