当前位置: 首页 > 工具软件 > pyrasite > 使用案例 >

python内存泄露问题定位:附带解决pyrasite timed out

罗鸿福
2023-12-01

一、背景&思路

一个python后台服务,刚灰度上线,内存就疯狂上涨。回退之后内存并没有下降。

内存增长主要几种情况:

1. 对象被全局变量引用,生命周期长;

2. 申请的对象引用周期长;

3. 垃圾回收机制被禁用

推荐的定位工具:

pyrasite(强烈推荐):可以对正在运行的python程序,直接运行命令来检查程序状态

tracemalloc:python3内置,可以方便查看哪些对象占用。

二、定位过程

1)pyrasite:

先安装pyrasite,命令为:pip install pyrasite;

查看内存情况安装guppy,命令为pip install guppy,如果运行程序为python3 安装命令为 pip3 install guppy3。

step1:pyrasite-shell <pid>

#pyrasite-shell 25366

 step2: 使用guppy查看内存情况

>>> from guppy import hpy
>>> h = hpy()
>>> h.heap()
Partition of a set of 481966 objects. Total size = 54689660 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  71903  15  6604592  12   6604592  12 str
     1  61921  13  4845392   9  11449984  21 tuple
     2  11948   2  4815600   9  16265584  30 dict (no owner)
     3    549   0  2345328   4  18610912  34 _io.BufferedRandom
     4   3451   1  2217464   4  20828376  38 collections.deque
     5   2177   0  2099512   4  22927888  42 type
     6  14974   3  2036464   4  24964352  46 function
     7  27607   6  2027937   4  26992289  49 bytes
     8  18020   4  2018240   4  29010529  53 dict of xxxxxx.pkg.model.stat.Bucket
     9  18020   4  2018240   4  31028769  57 dict of xxxxxx.pkg.model.stat.BucketMetric

step3:再次使用h.heap()对比内存变化

>>> h.heap()
Partition of a set of 4091119 objects. Total size = 339875892 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 570204  14 63862848  19  63862848  19 dict of xxxxxx.pkg.model.stat.Bucket
     1 570204  14 63862848  19 127725696  38 dict of xxxxxx.pkg.model.stat.BucketMetric
     2 685195  17 32889360  10 160615056  47 _thread.RLock
     3 570204  14 31931424   9 192546480  57 xxxxxx.pkg.model.stat.Bucket
     4 570204  14 31931424   9 224477904  66 xxxxxx.pkg.model.stat.BucketMetric
     5  71597   2 12831184   4 237309088  70 list
     6  13521   0  6735136   2 244044224  72 dict (no owner)
     7  72862   2  6654832   2 250699056  74 str
     8  57024   1  6386688   2 257085744  76 dict of xxxxxx.pkg.model.service.Instance
     9  57022   1  6386464   2 263472208  78 dict of xxxxxx.pkg.model.service.CircuitBreakerStatus

确定内存泄露点为单例类反复初始化。

pyrasite超时问题处理

排查过程中出现一个报错:

Traceback (most recent call last):
  File "/usr/bin/pyrasite-shell", line 9, in <module>
    load_entry_point('pyrasite==2.0', 'console_scripts', 'pyrasite-shell')()
  File "/usr/lib/python2.7/site-packages/pyrasite/tools/shell.py", line 61, in shell
    payload = ipc.recv()
  File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 174, in recv
    header_data = self.recv_bytes(4)
  File "/usr/lib/python2.7/site-packages/pyrasite/ipc.py", line 187, in recv_bytes
    chunk = self.sock.recv(n - len(data))
socket.timeout: timed out

发现接收超时,手动修改代码 vi /usr/lib/python2.7/site-packages/pyrasite/ipc.py,将settimeout默认5改成50.

    149     def wait(self):
    150         """Wait for the injected payload to connect back to us"""
    151         (clientsocket, address) = self.server_sock.accept()
    152         self.sock = clientsocket
    153         self.sock.settimeout(50)
    154         self.address = address

 2)tracemalloc:

参考python文档https://docs.python.org/3/library/tracemalloc.html

需要修改代码,正常情况。

Top 10 lines
#1: app/xxxx.py:46: 92532.5 KiB
   for l in s.split('\n'):
#2: app/xxxx.py:23: 8643.4 KiB
   xxxx.extend(urllist)
#3: json/decoder.py:355: 6102.5 KiB
   obj, end = self.scan_once(s, idx)
#4: app/xxxx.py:33: 2841.1 KiB
   xxxx = {}
#5: server.py:219: 1812.2 KiB
   xxxx.xxxx = list(xxxx)
#6: tornado/httputil.py:209: 1584.3 KiB
   self.add(name, value.strip())
#7: python3.6/uuid.py:229: 1583.4 KiB
   hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
#8: app/xxxx.py:124: 1490.6 KiB
   self.xxxx = []
#9: <string>:14: 1436.5 KiB
#10: tornado/web.py:1665: 1354.1 KiB
   (k, self.decode_argument(v, name=k)) for (k, v) in kwargs.items()
1562 other: 14890.8 KiB
Total allocated size: 134271.3 KiB
Top 10 lines
#1: app/xxxx.py:46: 139575.9 KiB
   for l in s.split('\n'):
#2: app/xxxx.py:23: 13036.9 KiB
   xxxx.extend(urllist)
#3: json/decoder.py:355: 6162.6 KiB
   obj, end = self.scan_once(s, idx)
#4: app/xxxx.py:33: 4293.0 KiB
   xxxx = {}
#5: server.py:219: 2734.7 KiB
   xxxx.xxxx = list(xxxx)
#6: tornado/httputil.py:209: 2389.1 KiB
   self.add(name, value.strip())
#7: python3.6/uuid.py:229: 2388.3 KiB
   hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
#8: app/xxxx.py:124: 2248.2 KiB
   self.xxxx = []
#9: <string>:14: 2168.7 KiB
#10: tornado/web.py:1665: 2044.5 KiB
   (k, self.decode_argument(v, name=k)) for (k, v) in kwargs.items()
1556 other: 20667.8 KiB
Total allocated size: 197709.6 KiB

此时定位到内存增长符合预期,内部有一个内存队列,没有来得及消费。

异常情况

 Top 10 lines
#1: /usr/lib64/python3.6/json/decoder.py:355: 6079.0 KiB
   obj, end = self.scan_once(s, idx)
#2: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:40: 1622.3 KiB
   self.success = 0
#3: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:47: 1530.6 KiB
   self.lock = threading.RLock()
#4: /usr/lib64/python3.6/threading.py:85: 1235.1 KiB
   return _CRLock(*args, **kwargs)
#5: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:84: 1202.6 KiB
   self.win_chain: List[Bucket] = [Bucket() for n in range(0, bucket_count)]
#6: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:50: 720.1 KiB
   self.metric = BucketMetric()
#7: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:73: 484.0 KiB
   xxx.setdefault(host, []).extend(valid_pids)
#8: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:54: 423.3 KiB
   xxx.setdefault(host, []).append(item['xx'])
#9: /usr/local/xxx_server/svr_3.0.1.release/app/xxxx.py:65: 364.9 KiB
   xxxx.setdefault(item['buid'], []).append(item['xxx'])
#10: <frozen importlib._bootstrap_external>:487: 330.4 KiB
2441 other: 3730.2 KiB
Total allocated size: 17722.5 KiB

 Top 10 lines
#1: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:40: 8223.6 KiB
   self.success = 0
#2: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:47: 7777.9 KiB
   self.lock = threading.RLock()
#3: /usr/lib64/python3.6/threading.py:85: 6299.8 KiB
   return _CRLock(*args, **kwargs)
#4: /usr/lib64/python3.6/json/decoder.py:355: 6086.9 KiB
   obj, end = self.scan_once(s, idx)
#5: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:84: 5934.5 KiB
   self.win_chain: List[Bucket] = [Bucket() for n in range(0, bucket_count)]
#6: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/stat.py:50: 3655.5 KiB
   self.metric = BucketMetric()
#7: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/plugin/localregistry/filecache.py:31: 1578.8 KiB
   self.lock_file = open(lock_file_path, 'w+')
#8: /usr/lib64/python3.6/threading.py:846: 1264.4 KiB
   _start_new_thread(self._bootstrap, ())
#9: /usr/local/xxx_server/svr_env/lib64/python3.6/linecache.py:137: 790.6 KiB
   lines = fp.readlines()
#10: /usr/local/xxx_server/svr_env/lib/python3.6/site-packages/xxxxxx_app/pkg/model/service.py:190: 579.8 KiB
   return self.instance.id.value
2807 other: 18430.1 KiB
Total allocated size: 60621.9 KiB

三、定位结论:

程序问题:将一个全局单例类反复初始化,导致内存泄露;

工具体验:使用pyrasite + guppy定位更加方便快捷,不用修改代码,堆栈信息输出比tracemalloc更快。

 类似资料: