AWS RDS MySQL Troubleshooting 存储空间占用分析

孙海
2023-12-01

前言

不论是我们自建的MySQL数据库还是使用云数据库,都会遇到存储空间的相关问题,当存储满了之后,数据库可能会hang住甚至是crash。因此监控存储空间是非常有必要的。对于自建的数据库我们可以登陆到实例上进行排查,对于云数据库来说,由于用户无法接触到底层实例,因此在SQL层面进行排查是非常必要的。

数据库实例上的空间用于以下方面

用户创建的数据库[create database csdn]

binlog

general log & slow log & error log

redo &  undo

系统表空间

临时表空间

2. 分析解决

1. 查询数据库大小

查看每个库的大小

SELECT
	SUBSTRING_INDEX( it.NAME, '/', 1 ) AS table_schema,
	ROUND( SUM( its.allocated_size ) / 1024 / 1024 / 1024, 2 ) "size in GB",
	ROUND( SUM( t.data_free ) / 1024 / 1024 / 1024, 2 ) "fragmented size in GB" 
FROM
	information_schema.innodb_sys_tables it
	INNER JOIN information_schema.innodb_sys_tablespaces its ON it.space = its.space
	INNER JOIN information_schema.innodb_sys_tablestats istat ON istat.table_id = it.table_id
	INNER JOIN information_schema.TABLES t ON t.table_schema = SUBSTRING_INDEX( it.NAME, '/', 1 ) 
	AND t.table_name = SUBSTRING_INDEX( it.NAME, '/', - 1 ) 
GROUP BY
	1 
ORDER BY
	2 DESC;

更精确的,查看每个库中每个表的大小

SELECT
	SUBSTRING_INDEX( it.NAME, '/', 1 ) AS table_schema,
	t.table_name,
	ROUND( its.allocated_size / 1024 / 1024 / 1024, 2 ) "size in GB",
	ROUND( t.data_free / 1024 / 1024 / 1024, 2 ) "fragmented
	size in GB" 
FROM
	information_schema.innodb_sys_tables it
	INNER JOIN information_schema.innodb_sys_tablespaces its ON it.space = its.space
	INNER JOIN information_schema.innodb_sys_tablestats istat ON istat.table_id = it.table_id
	INNER JOIN information_schema.TABLES t ON t.table_schema = SUBSTRING_INDEX( it.NAME, '/', 1 ) 
	AND t.table_name = SUBSTRING_INDEX( it.NAME, '/', - 1 ) 
WHERE
	t.table_schema NOT IN ( 'performance_schema', 'mysql', 'information_schema' ) 
ORDER BY
	4 DESC;

2. 简要的查看binlog的大小

mysql> show master logs;
+----------------------------+-----------+-----------+
| Log_name                   | File_size | Encrypted |
+----------------------------+-----------+-----------+
| mysql-bin-changelog.002001 |       589 | No        |
| mysql-bin-changelog.002002 |       589 | No        |
| mysql-bin-changelog.002003 |       155 | No        |
+----------------------------+-----------+-----------+
3 rows in set (0.00 sec)

3. 查询临时表

SELECT
	* 
FROM
	information_schema.innodb_sys_tables 
WHERE
	NAME LIKE '%#%';

4. 查看共享表空间(innodb 表空间)

SELECT
	file_name,
	tablespace_name,
	table_name,
	ENGINE,
	index_length,
	total_extents,
	extent_size 
FROM
	information_schema.files 
WHERE
	file_name LIKE '%ibdata%';

结果如下:
+------------------------------+-----------------+------------+--------+--------------+---------------+-------------+
| file_name                    | tablespace_name | table_name | ENGINE | index_length | total_extents | extent_size |
+------------------------------+-----------------+------------+--------+--------------+---------------+-------------+
| /rdsdbdata/db/innodb/ibdata1 | innodb_system   | NULL       | InnoDB |         NULL |           466 |     1048576 |
+------------------------------+-----------------+------------+--------+--------------+---------------+-------------+
1 row in set (0.01 sec)

total_extents * 1048576= 共享表空间的总大小

5. 查看临时表空间的大小

SELECT
	file_name,
	tablespace_name,
	table_name,
	ENGINE,
	index_length,
	total_extents,
	extent_size 
FROM
	information_schema.files 
WHERE
	file_name LIKE '%ibtmp%';

6. general log& slow log

如果我们设置了参数

general log=1
slow_query_log=1

但是log_output=table(默认值)

那么会存储到 mysql.slow_log和mysql.general_log中。由于slow_log这张表的存储引擎为CSV,因此在使用SQL语句统计时可能得不到精确值

CREATE TABLE `slow_log` (
  `start_time` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
  `user_host` mediumtext NOT NULL,
  `query_time` time(6) NOT NULL,
  `lock_time` time(6) NOT NULL,
  `rows_sent` int(11) NOT NULL,
  `rows_examined` int(11) NOT NULL,
  `db` varchar(512) NOT NULL,
  `last_insert_id` int(11) NOT NULL,
  `insert_id` int(11) NOT NULL,
  `server_id` int(10) unsigned NOT NULL,
  `sql_text` mediumblob NOT NULL,
  `thread_id` bigint(21) unsigned NOT NULL
) ENGINE=CSV DEFAULT CHARSET=utf8 COMMENT='Slow log'

7. 查看relay log的大小(如果是只读副本)

show slave status\G 

需要重点关注Relay_Log_Space

后续的操作

1. 如果临时表(ibtemp1)过大

这意味着SQL可能不够优化,在运行过程中产生了大量的临时文件

在5.7中,隐式临时表是创建在ibtmp文件中的,SQL结束后,会标记删除,但是空间依然不会返还给操作系统,如果需要返还,则需要重启数据库

To reclaim disk space occupied by a temporary tablespace data file, restart the MySQL server. Restarting the server removes and recreates the temporary tablespace data file according to the attributes defined by innodb_temp_data_file_path.

文档如下:

https://dev.mysql.com/doc/refman/5.7/en/innodb-temporary-tablespace.html

同时还遇见一个问题,由于存储空间满了,mysqld crash,加了存储空间之后,数据库恢复正常,但是这里也有一个reboot的过程,但临时表空间并没有释放掉

这是由于:

The temporary tablespace is removed on normal shutdown or on an aborted initialization, and is recreated each time the server is started. The temporary tablespace receives a dynamically generated space ID when it is created. Startup is refused if the temporary tablespace cannot be created. The temporary tablespace is not removed if the server halts unexpectedly. In this case, a database administrator can remove the temporary tablespace manually or restart the server, which removes and recreates the temporary tablespace automatically.

所以这是算是预期行为,只有normal shutdown的时候删除文件 再启动的时候创建

2. binlog占用过大

有两种情况

  1. 在一定时间内由于大量的DML操作产生了大量的binlog日志
  2. 通常,如果没有设置binlog保留期的话,AWS RDS 会在短时间内清除binlog

查看binlog保留期

mysql> CALL mysql.rds_show_configuration;
+------------------------+-------+-----------------------------------------------------------------------------------------------------------+
| name                   | value | description                                                                                               |
+------------------------+-------+-----------------------------------------------------------------------------------------------------------+
| binlog retention hours | 24  | binlog retention hours specifies the duration in hours before binary logs are automatically deleted.      |
| source delay           | 0     | source delay specifies replication delay in seconds between current instance and its master.              |
| target delay           | 0     | target delay specifies replication delay in seconds between current instance and its future read-replica. |
+------------------------+-------+-----------------------------------------------------------------------------------------------------------+

NULL是默认值, 24代表保留24小时的binlog

因此如果想缩短保留期的话,需要运行如下语句:

mysql> call mysql.rds_set_configuration('binlog retention hours', 10);

3. 如果general log & slow_log占用过大

运行如下语句各两次:这是由于运行一次会轮换到备份表,第二次才会进行truncate

mysql> CALL mysql.rds_rotate_slow_log;
mysql> CALL mysql.rds_rotate_general_log;

4. 如果是数据库中表碎片太多

我们可以使用如下语句来进行表整理, 建议在业务低峰期进行

mysql> OPTIMIZE TABLE <table_name>;

mysql> ALTER TABLE <table_name> ENGINE=INNODB;

5.如果可以抓到现场,我们可以运行如下语句来进行定位

show full processlist  查看当前运行的语句

show engine innodb status 查看long running transaction

FAQ

1. RDS存储空间是否可以在线缩容

答:RDS目前还不支持存储在线缩容,如果想缩小磁盘得话,需要使用mysqldump 或者是DMS 将数据库导入到新的数据库实例中。

 类似资料: