当前位置: 首页 > 工具软件 > Cinder > 使用案例 >

OpenStack之cinder基于快照恢复磁盘bug解决过程

齐承运
2023-12-01

OpenStack之cinder基于快照恢复磁盘bug解决过程

现象

OpenStack,使用基于快照恢复磁盘数据的功能,但是恢复不成功。

问题的原因

通过解读源码,最终发现,是ceph的相关配置文件名的命名必须是一字不差才行。

因为cinder相关模块源码里边,将ceph相关配置文件名写死了。

我们的运维工程师在部署ceph阶段,就已经使用了非官方的keyring文件名,系统运行了很久,直到用到了快照恢复磁盘数据这个功能,才触发了这个bug。

解决过程

第一件事,打开debug模式,就可以看到更详细的日志:

vi /etc/cinder/cinder.conf

debug=true

查看日志:

tail -f /var/log/cinder/cinder-volume.log

问题1,缺少所需的配置参数

错误日志如下

2020-04-14 11:32:01.241 1403591 WARNING cinder.context [req-c72b8fca-7563-48c2-b4f7-5e67bd6b2dc4 daa48f0af437437f8abe6aa47102e7e5 f02d23f04448412781122a365621e218 - 154925c4fbe74cc2ae2b98b6bd0aea5f 154925c4fbe74cc2ae2b98b6bd0aea5f] Unable to get internal tenant context: Missing required config parameters.

定位到抛错误的程序

文件位于:/usr/lib/python3/dist-packages/cinder/context.py

源码如下:

def get_internal_tenant_context():
    """Build and return the Cinder internal tenant context object
    This request context will only work for internal Cinder operations. It will
    not be able to make requests to remote services. To do so it will need to
    use the keystone client to get an auth_token.
    """
    project_id = CONF.cinder_internal_tenant_project_id
    user_id = CONF.cinder_internal_tenant_user_id

    if project_id and user_id:
        return RequestContext(user_id=user_id,
                              project_id=project_id,
                              is_admin=True,
                              overwrite=False)
    else:
        LOG.warning('Unable to get internal tenant context: Missing '
                    'required config parameters.')
        return None

修改cinder.conf 增加cinder的所属projectID和USERID。

其实到最后可以发现,此举非必须,并不影响使用快照恢复磁盘数据的功能。

文件位于:/etc/cinder/cinder.conf

[DEFAULT]
cinder_internal_tenant_project_id = 84e68a2d256c466cbd2796472769f037
cinder_internal_tenant_user_id = 90b78005e1824e91b912c50b045faa5e

问题2,校验权限出错,说明获取参数的过程出现了错误

错误日志如下

2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd File "/usr/lib/python3/dist-packages/os_brick/initiator/linuxrbd.py", line 70, in connect

2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd client.connect()

2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd File "rados.pyx", line 893, in rados.Rados.connect

2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd rados.PermissionError: [errno 1] error connecting to the cluster

2020-04-14 17:44:36.073 64800 ERROR os_brick.initiator.linuxrbd

定位到抛错误的程序

文件位于:/usr/lib/python3/dist-packages/os_brick/initiator/linuxrbd.py

文件源码如下:

class RBDClient(object):

    def __init__(self, user, pool, *args, **kwargs):

        self.rbd_user = user
        self.rbd_pool = pool

        for attr in ['rbd_user', 'rbd_pool']:
            val = getattr(self, attr)
            if val is not None:
                setattr(self, attr, utils.convert_str(val))

        # allow these to be overridden for testing
        self.rados = kwargs.get('rados', rados)
        self.rbd = kwargs.get('rbd', rbd)

        if self.rados is None:
            raise exception.InvalidParameterValue(
                err=_('rados module required'))
        if self.rbd is None:
            raise exception.InvalidParameterValue(
                err=_('rbd module required'))

        self.rbd_conf = kwargs.get('conffile', '/etc/ceph/ceph.conf')
        self.rbd_cluster_name = kwargs.get('rbd_cluster_name', 'ceph')
        self.rados_connect_timeout = kwargs.get('rados_connect_timeout', -1)
        cat_res = os.popen("cat %s" % self.rbd_conf)
#----我们自己手动打的日志--开始
        cat_res = os.popen("cat %s" % self.rbd_conf)
        for ll in cat_res.readlines():
            LOG.debug("=================%s======================"% ll)
        LOG.debug("=+++++++++++++++++++user:{},pool:{},rbd:{},conf:{},cluster:{},timeout:{}+++++++++++++++++++++++++".format(self.rbd_user,self.rbd_pool,self.rbd,self.rbd_conf,self.rbd_cluster_name,self.rados_connect_timeout))
#----我们自己手动打的日志--结束
        self.client, self.ioctx = self.connect()

    def __enter__(self):
        return self

    def __exit__(self, type_, value, traceback):
        self.disconnect()

    def connect(self):
        LOG.debug("opening connection to ceph cluster (timeout=%s).",
                  self.rados_connect_timeout)
        client = self.rados.Rados(rados_id=self.rbd_user,
                                  clustername=self.rbd_cluster_name,
                                  conffile=self.rbd_conf)

        try:
            if self.rados_connect_timeout >= 0:
                client.connect(
                    timeout=self.rados_connect_timeout)
            else:
                client.connect()
            ioctx = client.open_ioctx(self.rbd_pool)
            return client, ioctx
        except self.rados.Error:
            msg = _("Error connecting to ceph cluster.")
            LOG.exception(msg)
            # shutdown cannot raise an exception
            client.shutdown()
            raise exception.BrickException(message=msg)	

得到两个关键信息

1.关键信息一

根据

cat_res = os.popen("cat %s" % self.rbd_conf)

for ll in cat_res.readlines():

LOG.debug("=================%s======================"% ll)

获得如下信息

=================mon_host = 172.1.1.11:6789,172.1.1.12:6789,172.1.1.13:6789======================

=================[client.cinder]======================

=================key = AQBJ6FteAhQZIhAA+R9uYw8NmCLSIiVmEYSnqQ======================

发现key = AQBJ6FteAhQZIhAA+R9uYw8NmCLSIiVmEYSnqQ== 并不是正确的cinder的keyring。

2.关键信息二

根据

LOG.debug("=+++++++++++++++++++user:{},pool:{},rbd:{},conf:{},cluster:{},timeout:{}+++++++++++++++++++++++++".format(self.rbd_user,self.rbd_pool,self.rbd,self.rbd_conf,self.rbd_cluster_name,self.rados_connect_timeout))

获得如下信息

conf:/tmp/brickrbd_zsp449_n

于是,搜索brickrbd

find /usr/lib/python3/dist-packages/ -name "*brickrbd*"

cd /usr/lib/python3/dist-packages/

grep -rl "brickrbd_" *

结果找到以下文件:

/usr/lib/python3/dist-packages/os_brick/initiator/connectors/rbd.py

文件位于:/usr/lib/python3/dist-packages/os_brick/initiator/connectors/rbd.py

文件源码如下:实现的是获取keyring的方法

def _check_or_get_keyring_contents(self, keyring, cluster_name, user):
    try:
        if keyring is None:
            if user:
                # 此处写死了必须访问/etc/ceph/%s.client.%s.keyring,所以keyring文件不能改名字
                keyring_path = ("/etc/ceph/%s.client.%s.keyring" %
                                (cluster_name, user))
                with open(keyring_path, 'r') as keyring_file:
                    keyring = keyring_file.read()
            else:
                keyring = ''
        return keyring
    except IOError:
        msg = (_("Keyring path %s is not readable.") % (keyring_path))
        raise exception.BrickException(msg=msg)


def _create_ceph_conf(self, monitor_ips, monitor_ports,
                      cluster_name, user, keyring):
    monitors = ["%s:%s" % (ip, port) for ip, port in
                zip(self._sanitize_mon_hosts(monitor_ips), monitor_ports)]
    mon_hosts = "mon_host = %s" % (','.join(monitors))

    keyring = self._check_or_get_keyring_contents(keyring, cluster_name, user)

    try:
        fd, ceph_conf_path = tempfile.mkstemp(prefix="brickrbd_")
        with os.fdopen(fd, 'w') as conf_file:
            conf_file.writelines([mon_hosts, "\n", keyring, "\n"])
        return ceph_conf_path

    except IOError:
        msg = (_("Failed to write data to %s.") % (ceph_conf_path))
        raise exception.BrickException(msg=msg)

通过源码发现,其写死了必须访问/etc/ceph/%s.client.%s.keyring,所以keyring文件不能改名字。

最终解决方案

在/etc/ceph/生成文件,文件名如下:

ceph.client.cinder.keyring

ceph.client.cinder-backup.keyring

修改ceph.conf

[client.cinder]

keyring = /etc/ceph/ceph.client.cinder.keyring

[client.cinder-backup]

keyring = /etc/ceph/ceph.client.cinder-backup.keyring

修改文件属主

chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring

chown cinder:cinder /etc/ceph/ceph.client.cinder-backup.keyring

 类似资料: