OpenStack,使用基于快照恢复磁盘数据的功能,但是恢复不成功。
通过解读源码,最终发现,是ceph的相关配置文件名的命名必须是一字不差才行。
因为cinder相关模块源码里边,将ceph相关配置文件名写死了。
我们的运维工程师在部署ceph阶段,就已经使用了非官方的keyring文件名,系统运行了很久,直到用到了快照恢复磁盘数据这个功能,才触发了这个bug。
第一件事,打开debug模式,就可以看到更详细的日志:
vi /etc/cinder/cinder.conf
debug=true
查看日志:
tail -f /var/log/cinder/cinder-volume.log
错误日志如下
2020-04-14 11:32:01.241 1403591 WARNING cinder.context [req-c72b8fca-7563-48c2-b4f7-5e67bd6b2dc4 daa48f0af437437f8abe6aa47102e7e5 f02d23f04448412781122a365621e218 - 154925c4fbe74cc2ae2b98b6bd0aea5f 154925c4fbe74cc2ae2b98b6bd0aea5f] Unable to get internal tenant context: Missing required config parameters.
定位到抛错误的程序
文件位于:/usr/lib/python3/dist-packages/cinder/context.py
源码如下:
def get_internal_tenant_context():
"""Build and return the Cinder internal tenant context object
This request context will only work for internal Cinder operations. It will
not be able to make requests to remote services. To do so it will need to
use the keystone client to get an auth_token.
"""
project_id = CONF.cinder_internal_tenant_project_id
user_id = CONF.cinder_internal_tenant_user_id
if project_id and user_id:
return RequestContext(user_id=user_id,
project_id=project_id,
is_admin=True,
overwrite=False)
else:
LOG.warning('Unable to get internal tenant context: Missing '
'required config parameters.')
return None
修改cinder.conf 增加cinder的所属projectID和USERID。
其实到最后可以发现,此举非必须,并不影响使用快照恢复磁盘数据的功能。
文件位于:/etc/cinder/cinder.conf
[DEFAULT]
cinder_internal_tenant_project_id = 84e68a2d256c466cbd2796472769f037
cinder_internal_tenant_user_id = 90b78005e1824e91b912c50b045faa5e
错误日志如下
2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd File "/usr/lib/python3/dist-packages/os_brick/initiator/linuxrbd.py", line 70, in connect
2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd client.connect()
2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd File "rados.pyx", line 893, in rados.Rados.connect
2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd rados.PermissionError: [errno 1] error connecting to the cluster
2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd
定位到抛错误的程序
文件位于:/usr/lib/python3/dist-packages/os_brick/initiator/linuxrbd.py
文件源码如下:
class RBDClient(object):
def __init__(self, user, pool, *args, **kwargs):
self.rbd_user = user
self.rbd_pool = pool
for attr in ['rbd_user', 'rbd_pool']:
val = getattr(self, attr)
if val is not None:
setattr(self, attr, utils.convert_str(val))
# allow these to be overridden for testing
self.rados = kwargs.get('rados', rados)
self.rbd = kwargs.get('rbd', rbd)
if self.rados is None:
raise exception.InvalidParameterValue(
err=_('rados module required'))
if self.rbd is None:
raise exception.InvalidParameterValue(
err=_('rbd module required'))
self.rbd_conf = kwargs.get('conffile', '/etc/ceph/ceph.conf')
self.rbd_cluster_name = kwargs.get('rbd_cluster_name', 'ceph')
self.rados_connect_timeout = kwargs.get('rados_connect_timeout', -1)
cat_res = os.popen("cat %s" % self.rbd_conf)
#----我们自己手动打的日志--开始
cat_res = os.popen("cat %s" % self.rbd_conf)
for ll in cat_res.readlines():
LOG.debug("=================%s======================"% ll)
LOG.debug("=+++++++++++++++++++user:{},pool:{},rbd:{},conf:{},cluster:{},timeout:{}+++++++++++++++++++++++++".format(self.rbd_user,self.rbd_pool,self.rbd,self.rbd_conf,self.rbd_cluster_name,self.rados_connect_timeout))
#----我们自己手动打的日志--结束
self.client, self.ioctx = self.connect()
def __enter__(self):
return self
def __exit__(self, type_, value, traceback):
self.disconnect()
def connect(self):
LOG.debug("opening connection to ceph cluster (timeout=%s).",
self.rados_connect_timeout)
client = self.rados.Rados(rados_id=self.rbd_user,
clustername=self.rbd_cluster_name,
conffile=self.rbd_conf)
try:
if self.rados_connect_timeout >= 0:
client.connect(
timeout=self.rados_connect_timeout)
else:
client.connect()
ioctx = client.open_ioctx(self.rbd_pool)
return client, ioctx
except self.rados.Error:
msg = _("Error connecting to ceph cluster.")
LOG.exception(msg)
# shutdown cannot raise an exception
client.shutdown()
raise exception.BrickException(message=msg)
得到两个关键信息
1.关键信息一
根据
cat_res = os.popen("cat %s" % self.rbd_conf)
for ll in cat_res.readlines():
LOG.debug("=================%s======================"% ll)
获得如下信息
=================mon_host = 172.1.1.11:6789,172.1.1.12:6789,172.1.1.13:6789======================
=================[client.cinder]======================
=================key = AQBJ6FteAhQZIhAA+R9uYw8NmCLSIiVmEYSnqQ======================
发现key = AQBJ6FteAhQZIhAA+R9uYw8NmCLSIiVmEYSnqQ== 并不是正确的cinder的keyring。
2.关键信息二
根据
LOG.debug("=+++++++++++++++++++user:{},pool:{},rbd:{},conf:{},cluster:{},timeout:{}+++++++++++++++++++++++++".format(self.rbd_user,self.rbd_pool,self.rbd,self.rbd_conf,self.rbd_cluster_name,self.rados_connect_timeout))
获得如下信息
conf:/tmp/brickrbd_zsp449_n
于是,搜索brickrbd
find /usr/lib/python3/dist-packages/ -name "*brickrbd*"
cd /usr/lib/python3/dist-packages/
grep -rl "brickrbd_" *
结果找到以下文件:
/usr/lib/python3/dist-packages/os_brick/initiator/connectors/rbd.py
文件位于:/usr/lib/python3/dist-packages/os_brick/initiator/connectors/rbd.py
文件源码如下:实现的是获取keyring的方法
def _check_or_get_keyring_contents(self, keyring, cluster_name, user):
try:
if keyring is None:
if user:
# 此处写死了必须访问/etc/ceph/%s.client.%s.keyring,所以keyring文件不能改名字
keyring_path = ("/etc/ceph/%s.client.%s.keyring" %
(cluster_name, user))
with open(keyring_path, 'r') as keyring_file:
keyring = keyring_file.read()
else:
keyring = ''
return keyring
except IOError:
msg = (_("Keyring path %s is not readable.") % (keyring_path))
raise exception.BrickException(msg=msg)
def _create_ceph_conf(self, monitor_ips, monitor_ports,
cluster_name, user, keyring):
monitors = ["%s:%s" % (ip, port) for ip, port in
zip(self._sanitize_mon_hosts(monitor_ips), monitor_ports)]
mon_hosts = "mon_host = %s" % (','.join(monitors))
keyring = self._check_or_get_keyring_contents(keyring, cluster_name, user)
try:
fd, ceph_conf_path = tempfile.mkstemp(prefix="brickrbd_")
with os.fdopen(fd, 'w') as conf_file:
conf_file.writelines([mon_hosts, "\n", keyring, "\n"])
return ceph_conf_path
except IOError:
msg = (_("Failed to write data to %s.") % (ceph_conf_path))
raise exception.BrickException(msg=msg)
通过源码发现,其写死了必须访问/etc/ceph/%s.client.%s.keyring,所以keyring文件不能改名字。
在/etc/ceph/生成文件,文件名如下:
ceph.client.cinder.keyring
ceph.client.cinder-backup.keyring
修改ceph.conf
[client.cinder]
keyring = /etc/ceph/ceph.client.cinder.keyring
[client.cinder-backup]
keyring = /etc/ceph/ceph.client.cinder-backup.keyring
修改文件属主
chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring