StarlingX系统具有升级的能力,这个特性叫做“patching”,提供从2个版本之间升级的能力,主要用户bug修复、安全补丁和特性增强等等。
Patching支持两种补丁, In-Service补丁和Reboot-required补丁。In-Service补丁不需要主机节点重启,只需要服务进程重启即可。Reboot-required补丁需要重启主机以实现补丁生效。在升级Reboot-required补丁时,需要先对主机进行lock操作,等待补丁applied,再unlock使其生效。
这篇介绍文档,主要面向开发人员使用补丁功能,而不是产品用户指南。它更着重介绍补丁修复功能,而不是包含补丁的各个方面。
简要的说,补丁修复包含2个阶段,创建补丁和应用补丁。下面进行详细介绍这两部分。
一个StarlingX补丁包括一个或多个系统升级所需要的rpm包。在开始创建补丁之前需要验证rpm包已经在已部署的StarlingX上安装了。以下步骤可以帮助我们确认。
system show
controller-0:~$ . /etc/platform/openrc [sysadmin@controller-0 ~(keystone_admin)]$ system show +----------------------+--------------------------------------+ | Property | Value | +----------------------+--------------------------------------+ | contact | None | | created_at | 2019-10-14T03:10:50.862114+00:00 | | description | None | | https_enabled | False | | location | None | | name | 608dfe48-9a05-4b21-afc1-ea122574caa7 | | region_name | RegionOne | | sdn_enabled | False | | security_feature | spectre_meltdown_v1 | | service_project_name | services | | software_version | 19.09 | | system_mode | duplex | | system_type | All-in-one | | timezone | UTC | | updated_at | 2019-10-14T03:12:41.983029+00:00 | | uuid | 2639ad15-08a7-4f1b-a372-f927a5e4ab31 | | vswitch_type | none | +----------------------+--------------------------------------+
一旦确定需要升级/安装的rpm包,下一步就是准备补丁构建环境。作为StarlingX开发人员,最简单的办法是使用StarlingX Building 容器,我们只需要对容器进行小小的修改就可以了。StarlingX Building 容器可以使用构建教程生成。
现在假设StarlingX的源码已经下载好了,需要升级安装的rpm包也准备好了,现在我们开始构造补丁构建环境。再次声明,这个教程主要针对开发人员,而不是产品。
sudo pip install crypto pycrypto
$MY_REPO/stx/stx-update/extras/scripts/patch_build.sh
创建补丁。在这个脚本中,它从release-info.inc 文件中获取PLATFORM_RELEASE
参数,并把PYTHONPATH
指向repo中的cgcs-patch包,避免了安装cgcs-patch和手动指定PLATFORM_RELEASE
参数。可以使用下面命令查看构建脚本的使用说明。
$ $MY_REPO/stx/stx-update/cgcs-patch/bin/patch_build --help Usage: patch_build [ <args> ] ... <rpm list> Options: --id <id> Patch ID --release <version> Platform release version --status <status> Patch Status Code (ie. O, R, V) --unremovable Marks patch as unremovable --reboot-required <Y|N> Marks patch as reboot-required (default=Y) --summary <summary> Patch Summary --desc <description> Patch Description --warn <warnings> Patch Warnings --inst <instructions> Patch Install Instructions --req <patch_id> Required Patch --controller <rpm> New package for controller --worker <rpm> New package for worker node --worker-lowlatency <rpm> New package for worker-lowlatency node --storage <rpm> New package for storage node --controller-worker <rpm> New package for combined node --controller-worker-lowlatency <rpm> New package for lowlatency combined node --all-nodes <rpm> New package for all node types
使用这个脚本可以指定patch id、reboot required、depended patches、rpm list等等,如果系统上没有的,需要新安装的包需要指定节点,比如 --controller 指定是在控制节点上新装包。脚本执行完后,可以得到名字为“<patch-id>.patch”的文件。
下面深入研究下这个补丁文件。
$ file 001.patch 001.patch: gzip compressed data, was "001.patch", last modified: Fri Aug 16 05:56:59 2019, max compression
$ tar -xf 001.patch $ tree ├── 001.patch ├── metadata.tar ├── signature ├── signature.v2 └── software.tar
$MY_REPO/build-tools/signing/ima_signing_key.priv
$MY_REPO/build-tools/signing/dev-private-key.pem
key文件生成。
补丁生成后,可以手动安装补丁到指定的StarlingX系统,同时支持界面和命令行安装操作。补丁的生命周期包括四个状态: Available,Partial-Apply, Applied 和 Partial-Remove.
如果需要用命令行安装补丁,需要把补丁拷贝到active的控制节点上。StarlingX集群提供客户端命令sw-patch。补丁操作都是通过这个命令完成,这个命令提供了很多功能,包括upload, apply, query,host-install, delete, remove等等。
controller-0:~$ sw-patch --help usage: sw-patch [--debug] <subcommand> ... Subcomands: upload: Upload one or more patches to the patching system. upload-dir: Upload patches from one or more directories to the patching system. apply: Apply one or more patches. This adds the specified patches to the repository, making the update(s) available to the hosts in the system. Use --all to apply all available patches. Patches are specified as a space-separated list of patch IDs. remove: Remove one or more patches. This removes the specified patches from the repository. Patches are specified as a space-separated list of patch IDs. delete: Delete one or more patches from the patching system. Patches are specified as a space-separated list of patch IDs. query: Query system patches. Optionally, specify 'query applied' to query only those patches that are applied, or 'query available' to query those that are not. show: Show details for specified patches. what-requires: List patches that require the specified patches. query-hosts: Query patch states for hosts in the system. host-install: Trigger patch install/remove on specified host. To force install on unlocked node, use the --force option. host-install-async: Trigger patch install/remove on specified host. To force install on unlocked node, use the --force option. Note: This command returns immediately upon dispatching installation request. install-local: Trigger patch install/remove on the local host. This command can only be used for patch installation prior to initial configuration. drop-host: Drop specified host from table. query-dependencies: List dependencies for specified patch. Use --recursive for recursive query. is-applied: Query Applied state for list of patches. Returns True if all are Applied, False otherwise. report-app-dependencies: Report application patch dependencies, specifying application name with --app option, plus a list of patches. Reported dependencies can be dropped by specifying app with no patch list. query-app-dependencies: Display set of reported application patch dependencies. commit: Commit patches to free disk space. WARNING: This action is irreversible! --os-region-name: Send the request to a specified region
下面演示如何使用这个命令去安装补丁。演示用的补丁是需要安装在所有主机上的In-Service补丁,需要升级的StarlingX环境是 2+2+2的标准环境。
controller-0:~$ sudo sw-patch upload 001.patch 001 is now available # 检查补丁状态 controller-0:~$ sudo sw-patch query Patch ID RR Release Patch State ======== == ======= =========== 001 N 19.09 Available # 检查所有主机的的升级状态 controller-0:/$ sudo sw-patch query-hosts Hostname IP Address Patch Current Reboot Required Release State ============ ============== ============= =============== ====== ===== compute-0 192.178.204.7 Yes No 19.09 idle compute-1 192.178.204.9 Yes No 19.09 idle controller-0 192.178.204.3 Yes No 19.09 idle controller-1 192.178.204.4 Yes No 19.09 idle storage-0 192.178.204.12 Yes No 19.09 idle storage-1 192.178.204.11 Yes No 19.09 idle # Patch Current 表示当前主机是否有补丁安装,Yes表示没有安装补丁,No表示至少有一个补丁在安装
controller-0:/$ sudo sw-patch apply 001 001 is now in the repo # 检查补丁状态 controller-0:~$ sudo sw-patch query Patch ID RR Release Patch State ======== == ======= ============= 001 N 19.09 Partial-Apply # 检查节点状态 controller-0:~$ sudo sw-patch query-hosts Hostname IP Address Patch Current Reboot Required Release State ============ ============== ============= =============== ====== ===== compute-0 192.178.204.7 No No 19.09 idle compute-1 192.178.204.9 No No 19.09 idle controller-0 192.178.204.3 No No 19.09 idle controller-1 192.178.204.4 No No 19.09 idle storage-0 192.178.204.12 No No 19.09 idle storage-1 192.178.204.11 No No 19.09 idle
controller-0:~$ sudo sw-patch host-install controller-0 ... Installation was successful. # 检查主机升级状态 controller-0:~$ sudo sw-patch query-hosts Hostname IP Address Patch Current Reboot Required Release State ============ ============== ============= =============== ====== ===== compute-0 192.178.204.7 No No 19.09 idle compute-1 192.178.204.9 No No 19.09 idle controller-0 192.178.204.3 Yes No 19.09 idle controller-1 192.178.204.4 No No 19.09 idle storage-0 192.178.204.12 No No 19.09 idle storage-1 192.178.204.11 No No 19.09 idle # 在所有节点上安装补丁,需要为每个节点执行命令 controller-0:~$ sudo sw-patch host-install controller-1 .... Installation was successful. controller-0:~$ sudo sw-patch host-install compute-0 .... Installation was successful. controller-0:~$ sudo sw-patch host-install compute-1 .... Installation was successful. controller-0:~$ sudo sw-patch host-install storage-0 ... Installation was successful. controller-0:~$ sudo sw-patch host-install storage-1 ... Installation was successful.
controller-0:~$ sudo sw-patch query Patch ID RR Release Patch State ======== == ======= =========== 001 N 19.09 Applied controller-0:~$ sudo sw-patch query-hosts Hostname IP Address Patch Current Reboot Required Release State ============ ============== ============ =============== ======= ===== compute-0 192.178.204.7 Yes No 19.09 idle compute-1 192.178.204.9 Yes No 19.09 idle controller-0 192.178.204.3 Yes No 19.09 idle controller-1 192.178.204.4 Yes No 19.09 idle storage-0 192.178.204.12 Yes No 19.09 idle storage-1 192.178.204.11 Yes No 19.09 idle # 此时补丁升级程序完成
除了补丁升级,StarlingX还支持补丁回退和删除,通过下面两个命令实现sw-patch remove
和sw-patch host-install
,和补丁安装有点类似。
在上面的例子中,演示了在集群中补丁升级的功能。但是在大规模集群中,整个升级过程会持续很长的时间。特别是reboot-required补丁,这个方案会很糟糕,效率很低而且给管理员带来很多工作。因此StarlingX提供了另一个高级特性“补丁编排”。它支持集群通过一些简单的操作达到升级的目的,极大减少管理员的工作负担和较少出错。这个功能有三种方式使用,客户端CLI、界面Horizon和VIM Restful API。
controller-0:~$ sw-manager patch-strategy -h usage: sw-manager patch-strategy [-h] ... optional arguments: -h, --help show this help message and exit Software Patch Commands: create Create a strategy delete Delete a strategy apply Apply a strategy abort Abort a strategy show Show a strategy controller-0:~$ sw-manager patch-strategy create -h usage: sw-manager patch-strategy create [-h] [--controller-apply-type {serial,ignore}] [--storage-apply-type {serial,parallel,ignore}] [--worker-apply-type {serial,parallel,ignore}] [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10, 11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44, 45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61, 62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78, 79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95, 96,97,98,99,100}] [--instance-action {migrate,stop-start}] [--alarm-restrictions {strict,relaxed}] optional arguments: -h, --help show this help message and exit --controller-apply-type {serial,ignore} defaults to serial --storage-apply-type {serial,parallel,ignore} defaults to serial --worker-apply-type {serial,parallel,ignore} defaults to serial --max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56, 57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76, 77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96, 97,98,99,100} maximum worker hosts to patch in parallel --instance-action {migrate,stop-start} defaults to stop-start --alarm-restrictions {strict,relaxed} defaults to strict
<http://<oam_ip>:4545>
+--------+---------------------------------------+----------------------------+ | Method | URI | Description | +========+=======================================+============================+ | Post | /api/orchestration/sw-update/strategy | Create a patch strategy | +--------+---------------------------------------+----------------------------+ | Delete | /api/orchestration/sw-update/strategy | Delete current patch | | | | strategy | +--------+---------------------------------------+----------------------------+ | Get | /api/orchestration/sw-update/strategy | Get detailed information of| | | | current patch strategy | +--------+---------------------------------------+----------------------------+ | Post | /api/orchestration/sw-update/strategy/| Apply or abort a patch | | | actions | strategy | +--------+---------------------------------------+----------------------------+
在补丁安装时,补丁编排要求集群处于一个良好的状态。