在开发过程中,发现网页突然开不了,突然断线,诡异的断开又重连.终端连接服务器后,通过输入
df -h
[root@ackh-office-srv ~]# df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/mapper/cl-root 50G 50G 683M 99% /
devtmpfs 48G 0 48G 0% /dev
tmpfs 48G 84K 48G 1% /dev/shm
tmpfs 48G 98M 48G 1% /run
tmpfs 48G 0 48G 0% /sys/fs/cgroup
/dev/md126p2 1021M 176M 846M 18% /boot
/dev/mapper/cl-home 5.2T 1.3T 3.9T 25% /home
tmpfs 9.5G 4.0K 9.5G 1% /run/user/0
tmpfs 9.5G 16K 9.5G 1% /run/user/42
overlay 50G 50G 683M 99% /var/lib/docker/overlay2/7388d1c81c4cd4a42b629d9a1f156c022474444436c4683396d8b2d6ed0436c2/merged
shm 64M 0 64M 0% /var/lib/docker/containers/1cd6335f1b67041fcd4a43617288cfd9fd0c534b0066de2156eadda00d433bb5/shm
overlay 50G 50G 683M 99% /var/lib/docker/overlay2/883ab66c0922719519e6a17e409428e703b009cd4e2f6566ac1a9fb480eed671/merged
shm 64M 0 64M 0% /var/lib/docker/containers/00da0f419c08f9163ead3ad8b29f964a9679996dc3d9453a41679c05eb8b9840/shm
overlay 50G 50G 683M 99% /var/lib/docker/overlay2/b51eb5f1f20b4636693e590af65c9187a55ad6ea5485bfc88b49af1f2e9e8577/merged
shm 64M 8.0K 64M 1% /var/lib/docker/containers/9306555ac084a7fa5529c2216d906543db215d910ca74715b8ebb61c7d19bac1/shm
overlay 50G 50G 683M 99% /var/lib/docker/overlay2/8cf90078979cbf73dc6d21f47a57af9dc80e4ce4461898ad2d70870667d58615/merged
shm 64M 0 64M 0% /var/lib/docker/containers/10d1a1ea1506619a1b27305e5c6a709f389a8a5cf3dac69d7288472a2bec7e07/shm
通过上图可观察,挂载根目录的/dev/mapper/cl-root 和overlay特别占磁盘.
cd /
//切换根目录
du -sh *
//在根目录底下搜寻大文件
[root@ackh-office-srv /]# du -sh *
0 1
0 bin
143M boot
84K dev
148K dump.rdb
50M etc
1.3T home
0 lib
0 lib64
0 media
0 mnt
3.5G opt
du: 无法访问"proc/870/task/870/fd/4": 没有那个文件或目录
du: 无法访问"proc/870/task/870/fdinfo/4": 没有那个文件或目录
du: 无法访问"proc/870/fd/4": 没有那个文件或目录
du: 无法访问"proc/870/fdinfo/4": 没有那个文件或目录
0 proc
2.1G root
du: 无法访问"run/user/42/gvfs": 权限不够
98M run
0 sbin
0 srv
4.0K stop.sh
0 sys
2.7M tmp
7.4G usr
21G var
红色是挂载nfs的也就是挂载根目录的var(docker)和usr和opt最多就30G,咋占了50G.发现当前系统任然存在大量可以使用的空间。大量剩余的磁盘空间不清楚怎么丢失了…
如果发现是因为磁盘目录满了,可以查询该目录下哪些大文件,然后依次rm删除无用的占用内存的文件.
参考:https://blog.csdn.net/qq_34246965/article/details/110038468
[root@ackh-office-srv docker]# df -i
文件系统 Inode 已用(I) 可用(I) 已用(I)% 挂载点
/dev/mapper/cl-root 2826616 1329815 1496801 48% /
devtmpfs 12347387 515 12346872 1% /dev
tmpfs 12351426 6 12351420 1% /dev/shm
tmpfs 12351426 853 12350573 1% /run
tmpfs 12351426 16 12351410 1% /sys/fs/cgroup
/dev/md126p2 524032 374 523658 1% /boot
/dev/mapper/cl-home 550984064 964487 550019577 1% /home
tmpfs 12351426 4 12351422 1% /run/user/0
tmpfs 12351426 17 12351409 1% /run/user/42
overlay 2826616 1329815 1496801 48% /var/lib/docker/overlay2/7388d1c81c4cd4a42b629d9a1f156c022474444436c4683396d8b2d6ed0436c2/merged
shm 12351426 1 12351425 1% /var/lib/docker/containers/1cd6335f1b67041fcd4a43617288cfd9fd0c534b0066de2156eadda00d433bb5/shm
overlay 2826616 1329815 1496801 48% /var/lib/docker/overlay2/883ab66c0922719519e6a17e409428e703b009cd4e2f6566ac1a9fb480eed671/merged
shm 12351426 1 12351425 1% /var/lib/docker/containers/00da0f419c08f9163ead3ad8b29f964a9679996dc3d9453a41679c05eb8b9840/shm
overlay 2826616 1329815 1496801 48% /var/lib/docker/overlay2/b51eb5f1f20b4636693e590af65c9187a55ad6ea5485bfc88b49af1f2e9e8577/merged
shm 12351426 2 12351424 1% /var/lib/docker/containers/9306555ac084a7fa5529c2216d906543db215d910ca74715b8ebb61c7d19bac1/shm
overlay 2826616 1329815 1496801 48% /var/lib/docker/overlay2/8cf90078979cbf73dc6d21f47a57af9dc80e4ce4461898ad2d70870667d58615/merged
shm 12351426 1 12351425 1% /var/lib/docker/containers/10d1a1ea1506619a1b27305e5c6a709f389a8a5cf3dac69d7288472a2bec7e07/shm
rm -rf XXXX.json.log以及rm -rf nohuo.out
,发现磁盘空间没有减少,也就是没有变化.后来我找了资料,这叫僵尸文件,大意说文件虽然删除,但是进程还在,占的磁盘是不会释放的.且干掉该(deleted)
进程有可能造成程序崩溃,停止运行,因为产生僵尸文件的进程是webapp应用,不能被kill,kill后,将会影响生产环境业务,但是磁盘也已经满了lsof |grep delete
COMMAND:进程的名称
PID:进程标识符
PPID:父进程标识符(需要指定-R参数)
USER:进程所有者
PGID:进程所属组
FD:文件描述符,应用程序通过文件描述符识别该文件。
[root@ackh-office-srv docker]# lsof |grep delete
command PID PPID USER FD type DEVICE SIZE NODE NAME
dockerd 1786 30961 root 38w REG 253,0 4803128280 78891062 /var/lib/docker/containers/00da0f419c08f9163ead3ad8b29f964a9679996dc3d9453a41679c05eb8b9840/00da0f419c08f9163ead3ad8b29f964a9679996dc3d9453a41679c05eb8b9840-json.log (deleted)
java 3754 jenkins 19r REG 253,0 1484022 117579852 /tmp/jna4559948622136446323jar (deleted)
java 3754 jenkins 22r REG 253,0 2283188 117579857 /tmp/winstone4139804245682449191.jar (deleted)
qtp739498 3754 1610 jenkins 19r REG 253,0 1484022 117579852 /tmp/jna4559948622136446323jar (deleted)
docker-co 4828 root 21u FIFO 0,19 0t0 101036059 /run/docker/containerd/1cd6335f1b67041fcd4a43617288cfd9fd0c534b0066de2156eadda00d433bb5/ff2708634883552f82185c36787788edf39efd14478bcac94da3d284fd90ca08-stdin (deleted)
PM2 6434 root cwd DIR 253,0 6 20573326 /root/workspace/ackh_dcc/nuxt_dcc (deleted)
node 6434 6435 root cwd DIR 253,0 6 20573326 /root/workspace/ackh_dcc/nuxt_dcc (deleted)
tail 7559 root 3r REG 253,0 480157 111922606 /var/log/elasticsearch/mm-cluster-2020-09-06-1.log (deleted)
java 11332 root cwd DIR 253,0 6 714168 /opt/online-read (deleted)
java 11332 root 1w REG 253,0 4296573 714170 /opt/online-read/nohup.out (deleted)
java 11332 root 7r REG 253,0 108837596 1371544 /opt/online-read/kkFileView-2.2.1-SNAPSHOT.jar (deleted)
java 11332 11333 root 7r REG 253,0 108837596 1371544 /opt/online-read/kkFileView-2.2.1-SNAPSHOT.jar (deleted)
Service 11332 11368 root 1w REG 253,0 4296573 714170 /opt/online-read/nohup.out (deleted)
)
PM2 16490 root cwd DIR 253,0 6 20573326 /root/workspace/ackh_dcc/nuxt_dcc (deleted)
node 16490 16491 root cwd DIR 253,0 6 20573326 /root/workspace/ackh_dcc/nuxt_dcc (deleted)
java 18590 root 1w REG 253,0 5013822064 36434113 /opt/dcc_back/nohup.out (deleted)
java 18590 root 57r REG 253,0 1058776 33851535 /tmp/tomcat.4477088410858248213.8090/work/Tomcat/localhost/ROOT/upload_ece08897_119e_4133_9eb9_81dad51ec380_00000016.tmp (deleted)
http-nio- 18590 2570 root 2w REG 253,0 5013822064 36434113 /opt/dcc_back/nohup.out (deleted)
http-nio- 18590 2570 root 57r REG 253,0 1058776 33851535 /tmp/tomcat.4477088410858248213.8090/work/Tomcat/localhost/ROOT/upload_ece08897_119e_4133_9eb9_81dad51ec380_00000016.tmp (deleted)
java 19617 root cwd DIR 253,0 6 20573326 /root/workspace
C1 19617 19652 root 7r REG 253,0 108837686 714112 /opt/online-read/kkFileView-2.2.1-SNAPSHOT.jar (deleted)
VM 19617 19654 root 1w REG 253,0 23307 21372120 /root/workspace/ackh_dcc/nuxt_dcc/nohup.out (deleted)
PM2 16490 16498 root cwd DIR 253,0 6 20573326 /root/workspace/ackh_dcc/nohup.out (deleted)
/opt/dcc_back/nohup.out
以及/var/lib/docker/containers/00da0f419c08f9163ead3ad8b29f964a9679996dc3d9453a41679c05eb8b9840-json.log (deleted)
可以粗暴使用kill -9 pid
kill -9 1786
可以粗暴使用kill -9 1786
,但注意可能会影响进程的运行这个内容中只有日志,不会被它处调用,所以直接删除进程是可行的。(产生nohup.out以及一些读写报告的僵尸文件的进程是webapp应用,kill后,再次启动就好,这将会影响暂时生产环境业务)
也可以这样说delete状态的文件指向很多不同端口监听,并且有几十个已建立的连接,要是kill掉,父进程可能运行异常
占用数据盘资源的是一个运行中的jar包,这个jar包以nohup形式运行,之前运维没有关闭nohup.out的输出导致出现了这么大的僵死文件。
我的做法是将其kill后再次启动,将结果送到回收站,以后可以避免出现这些问题。
再次启动时候可以用
nohup java -jar XXXX.jar >/dev/null 2>&1 &
/dev/null 表示将标准输出信息重定向到"黑洞"
2>&1 表示将标准错误重定向到标准输出(由于标准输出已经定向到“黑洞”了,即:标准输出此时也是"黑洞",再将标准错误输出定向到标准输出,相当于错误输出也被定向至“黑洞”)
完后问题解决完毕,磁盘占用恢复正常。
例如 /root/workspace/ackh_dcc/nohup.out (deleted)的进程号为16490
进入虚拟文件系统对应进程目录(cd /proc/16490/fd),将僵尸文件清空
cd /proc/16490/fd
ll |grep nohup.out
//查看该进程虚拟文件路径下是否含有该文件echo '' > 1
//写一个空的1文件进去,重置空间df -h
//查看磁盘空间