一个python后台服务,刚灰度上线,内存就疯狂上涨。回退之后内存并没有下降。
1. 对象被全局变量引用,生命周期长;
2. 申请的对象引用周期长;
3. 垃圾回收机制被禁用
pyrasite(强烈推荐):可以对正在运行的python程序,直接运行命令来检查程序状态
tracemalloc:python3内置,可以方便查看哪些对象占用。
先安装pyrasite,命令为:pip install pyrasite;
查看内存情况安装guppy,命令为pip install guppy,如果运行程序为python3 安装命令为 pip3 install guppy3。
step1:pyrasite-shell <pid>
#pyrasite-shell 25366
step2: 使用guppy查看内存情况
>>> from guppy import hpy
>>> h = hpy()
>>> h.heap()
Partition of a set of 481966 objects. Total size = 54689660 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 71903 15 6604592 12 6604592 12 str
1 61921 13 4845392 9 11449984 21 tuple
2 11948 2 4815600 9 16265584 30 dict (no owner)
3 549 0 2345328 4 18610912 34 _io.BufferedRandom
4 3451 1 2217464 4 20828376 38 collections.deque
5 2177 0 2099512 4 22927888 42 type
6 14974 3 2036464 4 24964352 46 function
7 27607 6 2027937 4 26992289 49 bytes
8 18020 4 2018240 4 29010529 53 dict of xxxxxx.pkg.model.stat.Bucket
9 18020 4 2018240 4 31028769 57 dict of xxxxxx.pkg.model.stat.BucketMetric
step3:再次使用h.heap()对比内存变化
>>> h.heap()
Partition of a set of 4091119 objects. Total size = 339875892 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 570204 14 63862848 19 63862848 19 dict of xxxxxx.pkg.model.stat.Bucket
1 570204 14 63862848 19 127725696 38 dict of xxxxxx.pkg.model.stat.BucketMetric
2 685195 17 32889360 10 160615056 47 _thread.RLock
3 570204 14 31931424 9 192546480 57 xxxxxx.pkg.model.stat.Bucket
4 570204 14 31931424 9 224477904 66 xxxxxx.pkg.model.stat.BucketMetric
5 71597 2 12831184 4 237309088 70 list
6 13521 0 6735136 2 244044224 72 dict (no owner)
7 72862 2 6654832 2 250699056 74 str
8 57024 1 6386688 2 257085744 76 dict of xxxxxx.pkg.model.service.Instance
9 57022 1 6386464 2 263472208 78 dict of xxxxxx.pkg.model.service.CircuitBreakerStatus
确定内存泄露点为单例类反复初始化。
排查过程中出现一个报错:
Traceback (most recent call last):
File "/usr/bin/pyrasite-shell", line 9, in <module>
load_entry_point('pyrasite==2.0', 'console_scripts', 'pyrasite-shell')()
File "/usr/lib/python2.7/site-packages/pyrasite/tools/shell.py", line 61, in shell
payload = ipc.recv()
File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 174, in recv
header_data = self.recv_bytes(4)
File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
chunk = self.sock.recv(n - len(data))
socket.timeout: timed out
发现接收超时,手动修改代码 vi /usr/lib/python2.7/site-packages/pyrasite/ipc.py,将settimeout默认5改成50.
149 def wait(self):
150 """Wait for the injected payload to connect back to us"""
151 (clientsocket, address) = self.server_sock.accept()
152 self.sock = clientsocket
153 self.sock.settimeout(50)
154 self.address = address
参考python文档https://docs.python.org/3/library/tracemalloc.html
需要修改代码,正常情况。
Top 10 lines
#1: app/xxxx.py:46: 92532.5 KiB
for l in s.split('\n'):
#2: app/xxxx.py:23: 8643.4 KiB
xxxx.extend(urllist)
#3: json/decoder.py:355: 6102.5 KiB
obj, end = self.scan_once(s, idx)
#4: app/xxxx.py:33: 2841.1 KiB
xxxx = {}
#5: server.py:219: 1812.2 KiB
xxxx.xxxx = list(xxxx)
#6: tornado/httputil.py:209: 1584.3 KiB
self.add(name, value.strip())
#7: python3.6/uuid.py:229: 1583.4 KiB
hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
#8: app/xxxx.py:124: 1490.6 KiB
self.xxxx = []
#9: <string>:14: 1436.5 KiB
#10: tornado/web.py:1665: 1354.1 KiB
(k, self.decode_argument(v, name=k)) for (k, v) in kwargs.items()
1562 other: 14890.8 KiB
Total allocated size: 134271.3 KiB
Top 10 lines
#1: app/xxxx.py:46: 139575.9 KiB
for l in s.split('\n'):
#2: app/xxxx.py:23: 13036.9 KiB
xxxx.extend(urllist)
#3: json/decoder.py:355: 6162.6 KiB
obj, end = self.scan_once(s, idx)
#4: app/xxxx.py:33: 4293.0 KiB
xxxx = {}
#5: server.py:219: 2734.7 KiB
xxxx.xxxx = list(xxxx)
#6: tornado/httputil.py:209: 2389.1 KiB
self.add(name, value.strip())
#7: python3.6/uuid.py:229: 2388.3 KiB
hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
#8: app/xxxx.py:124: 2248.2 KiB
self.xxxx = []
#9: <string>:14: 2168.7 KiB
#10: tornado/web.py:1665: 2044.5 KiB
(k, self.decode_argument(v, name=k)) for (k, v) in kwargs.items()
1556 other: 20667.8 KiB
Total allocated size: 197709.6 KiB
此时定位到内存增长符合预期,内部有一个内存队列,没有来得及消费。
异常情况
Top 10 lines
#1: /usr/lib64/python3.6/json/decoder.py:355: 6079.0 KiB
obj, end = self.scan_once(s, idx)
#2: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:40: 1622.3 KiB
self.success = 0
#3: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:47: 1530.6 KiB
self.lock = threading.RLock()
#4: /usr/lib64/python3.6/threading.py:85: 1235.1 KiB
return _CRLock(*args, **kwargs)
#5: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:84: 1202.6 KiB
self.win_chain: List[Bucket] = [Bucket() for n in range(0, bucket_count)]
#6: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:50: 720.1 KiB
self.metric = BucketMetric()
#7: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:73: 484.0 KiB
xxx.setdefault(host, []).extend(valid_pids)
#8: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:54: 423.3 KiB
xxx.setdefault(host, []).append(item['xx'])
#9: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:65: 364.9 KiB
xxxx.setdefault(item['buid'], []).append(item['xxx'])
#10: <frozen importlib._bootstrap_external>:487: 330.4 KiB
2441 other: 3730.2 KiB
Total allocated size: 17722.5 KiB
Top 10 lines
#1: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:40: 8223.6 KiB
self.success = 0
#2: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:47: 7777.9 KiB
self.lock = threading.RLock()
#3: /usr/lib64/python3.6/threading.py:85: 6299.8 KiB
return _CRLock(*args, **kwargs)
#4: /usr/lib64/python3.6/json/decoder.py:355: 6086.9 KiB
obj, end = self.scan_once(s, idx)
#5: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:84: 5934.5 KiB
self.win_chain: List[Bucket] = [Bucket() for n in range(0, bucket_count)]
#6: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:50: 3655.5 KiB
self.metric = BucketMetric()
#7: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/plugin/localregistry/filecache.py:31: 1578.8 KiB
self.lock_file = open(lock_file_path, 'w+')
#8: /usr/lib64/python3.6/threading.py:846: 1264.4 KiB
_start_new_thread(self._bootstrap, ())
#9: /usr/local/xxx_server/svr_env/lib64/python3.6/linecache.py:137: 790.6 KiB
lines = fp.readlines()
#10: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/service.py:190: 579.8 KiB
return self.instance.id.value
2807 other: 18430.1 KiB
Total allocated size: 60621.9 KiB
程序问题:将一个全局单例类反复初始化,导致内存泄露;
工具体验:使用pyrasite + guppy定位更加方便快捷,不用修改代码,堆栈信息输出比tracemalloc更快。