Nova 源码分析

刘狐若

2023-12-01

2021SC@SDUSC

Nova 源码分析

一. Nova是什么

Nova是openstack提供计算实例的一种方式（又名虚拟服务器）。Nova支持创建虚拟机，并对系统容器有有限的支持。尽管Linux拥有守护进程，Nova依旧提供了作为守护进程的服务。

Nova与以下openstack服务共同组成基本服务：

Keystone：为所有openstack服务提供身份验证服务。
Glance：提供计算实例镜像库。所有计算实例通过glance镜像创建。
Neutron：为连接到根节点的计算实例提供虚拟和物理网络。
Placement：记录云的可用资源库存并在创建一个虚拟机选择如何使用资源。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MXpuTnaF-1633804453865)(https://docs.openstack.org/nova/xena/_images/architecture.svg)]

DB: sql database for data storage.
API: component that receives HTTP requests, converts commands and communicates with other components via the oslo.messaging queue or HTTP.
Scheduler: decides which host gets each instance.
Compute: manages communication with hypervisor and virtual machines.
Conductor: handles requests that need coordination (build/resize), acts as a database proxy, or handles object conversions.
Placement: tracks resource provider inventories and usages.

对于端使用者来说，可以直接通过Horizon，Openstack Client或者Nova Client等方式直接使用API创建和管理服务器。

Nova可以设置成通过RPC发出通告

对于开发者来说，oepnstack提供了相当丰富的guide和reference可以学习。

当用户发起一个新的请求时，该请求会先在 nova-api 中处理。nova-api 会对请求进行一系列检查，包括请求是否合法，配额是否足够等；当检查用过后，nova-api 就会为该请求分配一个唯一的虚拟机 ID ，并在数据库中新建对应的项来记录虚拟机的状态；然后，nova-api 会将请求发送给 nova-conductor 处理。

nova-conductor 主要管理服务之间的通信并进行任务处理。它在接收到请求之后，会为 nova-scheduler 创建一个 RequestSpec 对象用来包装与调度相关的所有请求信息，然后调用 nova-scheduler 服务的 select_destination 接口。

nova-scheduler 通过接收到的 RequestSpec 对象，首先将 RequestSpec 对象转换成 ResourceRequest 对象，并将该对象发送给 Placement 进行一次预筛选，然后会根据数据库中最新的系统状态做出调度决定，并告诉 nova-conductor 把该请求调度到合适的计算节点上。

nova-conductor 在得知调度决定后，会把请求发送给对应的 nova-compute 服务。

每个 nova-compute 服务都有独立的资源监视器（Resource Tracker）用来监视本地主机的资源使用情况。当计算节点接收到请求时，资源监视器能够检查主机是否有足够的资源。

若资源充足则启动指定虚拟机，并在数据库中更新虚拟机状态同时将最新的主机资源情况更新到数据库。
若当前主机不符合请求的资源要求，nova-compute拒绝要求并将请求重发给noca-conduct，重试整个调度过程。

二. Components

folders:

1.api:接受调用服务

cmd：Nova的各种服务入口
compute：创建和终止虚拟机的守护进程，管理虚拟机管理程序和虚拟机的通信。
conf：配置选项
conduct：处理需要协调的请求，是api，scheduler和compute的中介。
console：console服务
db：封装数据库服务
hacking：编码规范检查
image：封装镜像操作
keymgr：密钥管理器
locale：国际化相关
network：网络服务
notification：通知
object：避免直接操作数据库，封装操作
pci：PCI/SR-IOV支持
scheduler：scheduler服务
servicegroup：成员服务，服务组
storgage：CEPH存储
tests：单元测试
virt：支持的 hypervisor 驱动
volume：封装卷服务，Cinder接口抽象
Novncproxy：协调compute服务和数据库之间的交互，是compute和数据库交互的代

files:

   __init__.py
   availability_zones.py   # 区域设置的工具函数
   baserpc.py              # 基础 RPC 客户端/服务端实现
   block_device.py         # 块设备映射
   cache_utils.py          # oslo_cache 封装
   config.py               # 解析命令行参数
   context.py              # 贯穿 Nova 的所有请求的上下文
   crypto.py               # 包装标准加密数据元素
   debugger.py             # pydev 调试
   exception.py            # 基础异常类
   exception_wrapper.py    # 封装异常类
   filters.py              # 基础过滤器
   i18n.py                 # 集成 oslo_i18n
   loadables.py            # 可加载类
   manager.py              # 基础 Manager 类
   middleware.py           # 更新 oslo_middleware 的默认配置选项
   monkey_patch.py         # eventlet 猴子补丁
   policy.py               # 策略引擎
   profiler.py             # 调用 OSProfiler
   quota.py                # 每个项目的资源配额
   rpc.py                  # RPC 操作相关的工具函数
   safe_utils.py           # 不会导致循环导入的工具函数
   service.py              # 通用节点基类，用于在主机上运行的所有工作者
   service_auth.py         # 身份认证插件
   test.py                 # 单元测试基础类
   utils.py                # 工具函数
   version.py              # 版本号管理
   weights.py              # 权重插件
   wsgi.py                 # 管理 WSGI 应用的服务器类

三. 相关服务构造简介

conduct

api.py对RPC接口封装

rpcapi.py提供RPC接口

manager.py处理RPC API调用

compute访问数据库的操作全部要通过conduct代理完成，conduct操作object，一个object对应一个表
scheduler

filter提供过滤器实现，过滤不符合条件的主机

weights提供权重实现，用于计算权重并排序

四.Nova创建虚拟机试分析

首先，调用compute中service.py的create方法，通过它调用compute_api的__create_instance方法。

def create(*self*, *context*, *instance_type*,
        *image_href*, *kernel_id*=None, *ramdisk_id*=None,

        *min_count*=None, *max_count*=None,

        *display_name*=None, *display_description*=None,
        *key_name*=None, *key_data*=None, *security_groups*=None,

        *availability_zone*=None, *forced_host*=None, *forced_node*=None,

        *user_data*=None, *metadata*=None, *injected_files*=None,

        *admin_password*=None, *block_device_mapping*=None,

        *access_ip_v4*=None, *access_ip_v6*=None, *requested_networks*=None,

        *config_drive*=None, *auto_disk_config*=None, scheduler_hints*=None,

        *legacy_bdm*=True, *shutdown_terminate*=False,

        *check_server_group_quota*=False, *tags*=None,

        *supports_multiattach*=False, *trusted_certs*=None,

        *supports_port_resource_request*=False,

        *requested_host*=None, *requested_hypervisor_hostname*=None):

    """Provision instances, sending instance information to the

    scheduler. The scheduler will determine where the instance(s)

    go and will handle creating the DB entries.



    Returns a tuple of (instances, reservation_id)

    """

    *if* requested_networks and max_count is not None and max_count > 1:

      self._check_multiple_instances_with_specified_ip(

        requested_networks)

      self._check_multiple_instances_with_neutron_ports(

        requested_networks)


    *if* availability_zone:
      available_zones = availability_zones.\

        get_availability_zones(context.elevated(), self.host_api,

                    *get_only_available*=True)

      *if* forced_host is None and availability_zone not in \

          available_zones:

        msg = _('The requested availability zone is not available')

        *raise* exception.InvalidRequest(msg)



    filter_properties = scheduler_utils.build_filter_properties(

        scheduler_hints, forced_host, forced_node, instance_type)


    *return* self._create_instance(

      context, instance_type,

      image_href, kernel_id, ramdisk_id,

      min_count, max_count,

      display_name, display_description,

      key_name, key_data, security_groups,

      availability_zone, user_data, metadata,

      injected_files, admin_password,

      access_ip_v4, access_ip_v6,

      requested_networks, config_drive,

      block_device_mapping, auto_disk_config,

      *filter_properties*=filter_properties,

      *legacy_bdm*=legacy_bdm,

      *shutdown_terminate*=shutdown_terminate,

      *check_server_group_quota*=check_server_group_quota,
      *tags*=tags, *supports_multiattach*=supports_multiattach,

      *trusted_certs*=trusted_certs,

      *supports_port_resource_request*=supports_port_resource_request,

      *requested_host*=requested_host,

      *requested_hypervisor_hostname*=requested_hypervisor_hostname)

_create_instance方法调用了compute_task_api的schedule_and_build_instances方法，即conduct的api中的schedule_and_build_instances方法，它直接调用了conduct的compute的rpcapi中的schedule_and_build_instances方法。

    def schedule_and_build_instances(self, context, build_requests,
                                     request_specs,
                                     image, admin_password, injected_files,
                                     requested_networks,
                                     block_device_mapping,
                                     tags=None):
        version = '1.17'
        kw = {'build_requests': build_requests,
              'request_specs': request_specs,
              'image': jsonutils.to_primitive(image),
              'admin_password': admin_password,
              'injected_files': injected_files,
              'requested_networks': requested_networks,
              'block_device_mapping': block_device_mapping,
              'tags': tags}

        if not self.client.can_send_version(version):
            version = '1.16'
            del kw['tags']

        cctxt = self.client.prepare(version=version)
        cctxt.cast(context, 'schedule_and_build_instances', **kw)

cast是RPC调用schedule_and_build_instances方法，是异步调用，会立即返回

截至到现在，虽然目录由api->compute->conductor，但仍在nova-api进程中运行，直到cast方法执行，该方法由于是异步调用，会立即返回，不会等待RPC返回，因此nova-api任务完成，此时会响应用户请求，虚拟机状态为building。

之后，请求通过oslo message传递给conduct的manager.py，调用schedule_and_build_instances方法，它首先调用_schedule_instance的select_destinations方法

    def schedule_and_build_instances(self, context, build_requests,
                                     request_specs, image,
                                     admin_password, injected_files,
                                     requested_networks, block_device_mapping,
                                     tags=None):
        # Add all the UUIDs for the instances
        instance_uuids = [spec.instance_uuid for spec in request_specs]
        try:
            host_lists = self._schedule_instances(context, request_specs[0],
                    instance_uuids, return_alternates=True)
        except Exception as exc:
            LOG.exception('Failed to schedule instances')
            self._bury_in_cell0(context, request_specs[0], exc,
                                build_requests=build_requests,
                                block_device_mapping=block_device_mapping,
                                tags=tags)
            return

scheduler_client和compute_api以及compute_task_api都是一样对服务的client封装调用，不过scheduler没有api.py模块，而是有个单独的client目录，实现在nova/scheduler/client目录的query.py模块，select_destinations方法又很直接的调用了scheduler_rpcapi的select_destinations方法，终于又到了RPC调用环节。

RPC封装在scheduler的rpcapi.py中实现。

        cctxt = self.client.prepare(
            version=version, call_monitor_timeout=CONF.rpc_response_timeout,
            timeout=CONF.long_rpc_timeout)
        return cctxt.call(ctxt, 'select_destinations', **msg_args)

call方法时RPC的同步方法，conduct会一直等待scheduler返回，此时scheduler接管任务。

rpcapi调用manager.py的对应select_destination方法，这个方法又调用了driver的select_destination方法。这里的driver其实就是调度驱动，在配置文件中scheduler配置组指定，默认为filter_scheduler，对应nova/scheduler/filter_scheduler.py模块，该算法根据指定的filters过滤掉不满足条件的计算节点，然后通过weigh方法计算权值，最后选择权值高的作为候选计算节点返回。

最后nova-scheduler返回调度的hosts集合，任务结束。由于nova-conductor通过同步方法调用的该方法，因此nova-scheduler会把结果返回给nova-conductor服务。

conduct等待scheduler返回后回到manager.py的scheduler_and_build_instance方法。

之后调用compute_rpcapi的build_and_run_instance

   with obj_target_cell(instance, cell) as cctxt:
                self.compute_rpcapi.build_and_run_instance(
                    cctxt, instance=instance, image=image,
                    request_spec=request_spec,
                    filter_properties=filter_props,
                    admin_password=admin_password,
                    injected_files=injected_files,
                    requested_networks=requested_networks,
                    security_groups=legacy_secgroups,
                    block_device_mapping=instance_bdms,
                    host=host.service_host, node=host.nodename,
                    limits=host.limits, host_list=host_list,
                    accel_uuids=accel_uuids)

同理，rpcapi异步调用compute的同名方法，compute接管任务。

来到compute的manager.py，找到build_and_run_instance方法。

def build_and_run_instance(self, context, instance, image, request_spec,
                     filter_properties, accel_uuids, admin_password=None,
                     injected_files=None, requested_networks=None,
                     security_groups=None, block_device_mapping=None,
                     node=None, limits=None, host_list=None):

        @utils.synchronized(instance.uuid)
        def _locked_do_build_and_run_instance(*args, **kwargs):
            # NOTE(danms): We grab the semaphore with the instance uuid
            # locked because we could wait in line to build this instance
            # for a while and we want to make sure that nothing else tries
            # to do anything with this instance while we wait.
            with self._build_semaphore:
                try:
                    result = self._do_build_and_run_instance(*args, **kwargs)
                except Exception:
                    # NOTE(mriedem): This should really only happen if
                    # _decode_files in _do_build_and_run_instance fails, and
                    # that's before a guest is spawned so it's OK to remove
                    # allocations for the instance for this node from Placement
                    # below as there is no guest consuming resources anyway.
                    # The _decode_files case could be handled more specifically
                    # but that's left for another day.
                    result = build_results.FAILED
                    raise
                finally:
                    if result == build_results.FAILED:
                        # Remove the allocation records from Placement for the
                        # instance if the build failed. The instance.host is
                        # likely set to None in _do_build_and_run_instance
                        # which means if the user deletes the instance, it
                        # will be deleted in the API, not the compute service.
                        # Setting the instance.host to None in
                        # _do_build_and_run_instance means that the
                        # ResourceTracker will no longer consider this instance
                        # to be claiming resources against it, so we want to
                        # reflect that same thing in Placement.  No need to
                        # call this for a reschedule, as the allocations will
                        # have already been removed in
                        # self._do_build_and_run_instance().
                        self.reportclient.delete_allocation_for_instance(
                            context, instance.uuid)

                    if result in (build_results.FAILED,
                                  build_results.RESCHEDULED):
                        self._build_failed(node)
                    else:
                        self._build_succeeded(node)

        # NOTE(danms): We spawn here to return the RPC worker thread back to
        # the pool. Since what follows could take a really long time, we don't
        # want to tie up RPC workers.
        utils.spawn_n(_locked_do_build_and_run_instance,
                      context, instance, image, request_spec,
                      filter_properties, admin_password, injected_files,
                      requested_networks, security_groups,
                      block_device_mapping, node, limits, host_list,
                      accel_uuids)

这里的driver就是compute driver，通过compute配置组的compute_driver指定，这里为libvirt.LibvirtDriver，代码位于nova/virt/libvirt/driver.py，找到spawn()方法，该方法调用Libvirt创建虚拟机，并等待虚拟机状态为Active,nova-compute服务结束,整个创建虚拟机流程也到此结束。

五.总结

n_password, injected_files,
requested_networks, security_groups,
block_device_mapping, node, limits, host_list,
accel_uuids)
这里的driver就是compute driver，通过compute配置组的compute_driver指定，这里为libvirt.LibvirtDriver，代码位于nova/virt/libvirt/driver.py，找到spawn()方法，该方法调用Libvirt创建虚拟机，并等待虚拟机状态为Active`,nova-compute服务结束,整个创建虚拟机流程也到此结束。

Nova的架构和工作模式还有更多的内容可以挖掘，规律是不同层次通过RPC交流，调用manager中的实现方法，但是具体的策略待进一步探索。

Nova 源码分析

Nova 源码分析

一. Nova是什么

对于端使用者来说，可以直接通过Horizon，Openstack Client或者Nova Client等方式直接使用API创建和管理服务器。

对于开发者来说，oepnstack提供了相当丰富的guide和reference可以学习。

二. Components

folders:

files:

三. 相关服务构造简介

四.Nova创建虚拟机试分析

五.总结

相关阅读

相关文章

相关问答

相关文档