前几天同事对程序执行日志进行过滤,将不同程序的开始时间,结束时间过滤出来后发邮件通知相关人员。我无意间注意到,过滤操作执行时间比较长。 同事脚本很简单,大致如下:
echo 'Report update start!'
grep -i 'begin sale_data' process.entire.budget.log
grep -i 'finish sale_data' process.entire.budget.log
grep -i 'begin sale_profit' process.entire.budget.log
grep -i 'finish sale_profit' process.entire.budget.log
grep -i 'process_cost_data begin' process.entire.budget.log
grep -i 'process_cost_data has finished' process.entire.budget.log
grep -i 'process_po_data' process.entire.budget.log
grep -i 'purchase_order_data has finished' process.entire.budget.log
grep -i 'begin manufacture_profit' process.entire.budget.log
grep -i 'finish manufacture_profit' process.entire.budget.log
grep -i 'begin inprocess_sfb' process.entire.budget.log
grep -i 'finish inprocess_sfb' process.entire.budget.log
grep -i 'begin stock_data' process.entire.budget.log
grep -i 'finish stock_data' process.entire.budget.log
echo 'Report update over!'
其中process.entire.budget.log文件大约27万行,24M左右。 同事的脚本暂且不说可以采用下面方式优化
egrep -i "begin sale_data|...|finish stock_data" process.entire.budget.log >temp.log
grep -i 'begin sale_data' temp.log
。。。。
grep -i 'finish stock_data' temp.log
这样在process.entire.budget.log文件中仅遍历1次,在此之后形成的临时小文件上在过滤,从而提高执行速度。
同事这样的写法,相对简单,但执行时间大约23秒,而且使用time来查看时间, 基本都是用户态的cpu消耗时间,核心态时间仅0.13s。从我的感觉,好像不应该。所以在我自己pc上使用Cygwin来运行shell脚本,测试结果很惊奇,执行时间才1.2秒,其中用户态cpu消耗时间0.6秒,核心态cpu时间0.4秒。在一个更早期的ibm服务器(用于测试系统,安装的是linux内核版本5.3.18-57),执行时间0.9秒,其中用户态cpu时间0.76秒,核心态cpu时间0.14秒, 生产服务器性能更强大,为何执行时间更长? 我首先想到的是否是生产服务器活动进程多, 负荷大,由于某些调度的原因,导致cpu在cpu内的缓存丢失造成的性能影响, 但又一想, 生产服务器cpu负荷有足够余量,即使有影响也不应该导致这么大的差异。于是我简单实用python写了一个循环,来验证猜测。
#!/bin/env python
# -*- coding:utf-8 -*-
import time
# 对比10.0.20.106和10.0.20.168中执行时间,测试不同,但与昨天现象明显不同
# 10.0.20.106中lscpu如下
# Model name: Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz
# crpdb:~ # time python3 testcpu.py
# 11.831689008
#
# real 0m11.855s
# user 0m11.848s
# sys 0m0.008s
# crpdb:~ #
#
# 10.0.20.168中lscpu如下
# Model name: Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz
# erpdb:~ # time python3 testcpu.py
# 20.799113684
#
# real 0m20.854s
# user 0m20.833s
# sys 0m0.016s
# erpdb:~ #
start_proc=time.process_time()
for i in range(10000):
z=0
for y in range(10000):
z+=1
end_proc=time.process_time()
print(end_proc-start_proc)
通过这个简单脚本,发现生产服务器确实执行同样次数的循环,消耗时间更少。这时,我就非常奇怪了, 如何解释同事脚本的现象?猛然间,我突然想到,刚才在执行python时,注意到服务器上的python3版本不同(我以前写一个程序,python2和python3, 同样的代码,性能差异几乎1倍), 是否2个机器中差异和grep版本有关。我首先检查grep版本
crpdb:~ # grep --version
grep (GNU grep) 2.16
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
crpdb:~ #
erpdb:~ # grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
erpdb:~ #
发现版本确实不同, 我大致使用ldd看了一下依赖库的情况,
erpdb:~ # ldd /usr/bin/grep
linux-vdso.so.1 (0x00007ffc813b8000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f45003a9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f44fffd4000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4500865000)
erpdb:~ #
crpdb:~ # ldd /usr/bin/grep
linux-vdso.so.1 (0x00007ffc48bb1000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f4ac4b6c000)
libc.so.6 => /lib64/libc.so.6 (0x00007f4ac47c5000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4ac45a8000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4ac4dd2000)
crpdb:~ #
于是我将测试服务器中的grep 复制到/root目录, 然后修改脚本中grep, 使用/root/grep来测试。
crpdb:~ # ./grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
crpdb:~ #
crpdb:~ # time ./report.updata.sh
Report update start!
2022-10-27 03:00:42.562417 pid 19386 begin sale_data ...
2022-10-27 03:53:39.899914 pid 19386 finish sale_data ...
2022-10-27 03:00:42.564633 pid 19387 sales_profit.py begin sale_profit ...
2022-10-27 05:08:52.994804 pid 19387 sales_profit.py finish sale_profit ...
2022-10-27 03:11:04.038264 pid 19391 process_cost_data begin
2022-10-27 04:15:51.831861 pid 19391 process_cost_data has finished
2022-10-27 03:00:42.168866 pid 19401 process_po_data
2022-10-27 03:05:10.438073 pid 19401 purchase_order_data has finished
2022-10-27 03:00:42.562461 pid 19388 manufacture_profit.py begin manufacture_profit ...
2022-10-27 03:15:40.951847 pid 19388 manufacture_profit.py finish manufacture_profit ...
2022-10-27 03:00:42.139355 pid 19397 inprocess_sfb.py begin inprocess_sfb ...
2022-10-27 03:03:31.313 pid 19397 inprocess_sfb.py <module> finish inprocess_sfb
2022-10-27 03:03:31.314129 pid 19397 inprocess_sfb.py finish inprocess_sfb ...
2022-10-27 03:00:42.139382 pid 19389 stock_data.py begin stock_data ...
2022-10-27 03:08:42.790 pid 19389 stock_data.py <module> finish stock_data
2022-10-27 03:08:42.791174 pid 19389 stock_data.py finish stock_data ...
Report update over!
real 0m0.598s
user 0m0.500s
sys 0m0.060s
crpdb:~ #
此时, 问题就很明确了, 前面问题就是由于grep版本不同,形成的差异, 在新的版本下,grep的执行效率得到了很大提高。