【 Azure 】基于aks-engine的kubernetes集群升级

田远

2023-12-01

【 Azure 】基于aks-engine的kubernetes集群升级

Know before you go

In order to ensure that your aks-engine upgrade operation runs smoothly, there are a few things you should be aware of before getting started.

You will need access to the apimodel.json that was generated by aks-engine deploy or aks-engine generate (by default this file is placed into a relative directory that looks like _output//). aks-engine will use the --api-model argument to introspect the apimodel.json file in order to determine the cluster’s current Kubernetes version, as well as all other cluster configuration data as defined by aks-engine during the last time that aks-engine was used to deploy, scale, or upgrade the cluster.
aks-engine upgrade expects a cluster configuration that conforms to the current state of the cluster. In other words, the Azure resources inside the resource group deployed by aks-engine should be in the same state as when they were originally created by aks-engine. If you perform manual operations on your Azure IaaS resources (other than aks-engine scale and aks-engine upgrade) DO NOT use aks-engine upgrade, as the aks-engine-generated ARM template won’t be reconcilable against the state of the Azure resources that reside in the resource group. Some examples of manual operations that will prevent upgrade from working successfully:

renaming resources
executing follow-up CustomScriptExtensions against VMs after a cluster has been created: a VM or VMSS instance may only have a single CustomScriptExtension attached to it; follow-up operations CustomScriptExtension operations will essentially “replace” the CustomScriptExtension defined by aks-engine at cluster creation time, and aks-engine upgrade will not be able to recognize the VM resource.
aks-engine upgrade relies on some resources (such as VMs) to be named in accordance with the original aks-engine deployment. In summary, the set of Azure resources in the resource group are mutually reconcilable by aks-engine upgrade only if they have been exclusively created and managed as the result of a series of successive ARM template deployments originating from aks-engine.

aks-engine upgrade allows upgrading the Kubernetes version to any AKS Engine-supported patch release in the current minor release channel that is greater than the current version on the cluster (e.g., from 1.16.4 to 1.16.6), or to the next aks-engine-supported minor version (e.g., from 1.16.6 to 1.17.2). (Or, see aks-engine upgrade --force if you want to bypass AKS Engine “supported version requirements”). In practice, the next AKS Engine-supported minor version will commonly be a single minor version ahead of the current cluster version. However, if the cluster has not been upgraded in a significant amount of time, the “next” minor version may have actually been deprecated by aks-engine. In such a case, your long-lived cluster will be upgradable to the nearest, supported minor version that aks-engine supports at the time of upgrade (e.g., from 1.11.10 to 1.13.11).

To get the list of all available Kubernetes versions and upgrades, run the get-versions command:

./bin/aks-engine get-versions
To get the versions of Kubernetes that your particular cluster version is upgradable to, provide its current Kubernetes version in the version arg:

./bin/aks-engine get-versions --version 1.12.8

aks-engine upgrade relies upon a working connection to the cluster control plane during upgrade, both (1) to validate successful upgrade progress, and (2) to cordon and drain nodes before upgrading them, in order to minimize operational downtime of any running cluster workloads. If you are upgrading a private cluster, you must run aks-engine upgrade from a host VM that has network access to the control plane, for example a jumpbox VM that resides in the same VNET as the master VMs. For more information on private clusters refer to this documentation.
If using aks-engine upgrade in production, it is recommended to stage an upgrade test on an cluster that was built to the same specifications (built with the same cluster configuration + the same version of the aks-engine binary) as your production cluster before performing the upgrade, especially if the cluster configuration is “interesting”, or in other words differs significantly from defaults. The reason for this is that AKS Engine supports many different cluster configurations and the extent of E2E testing that the AKS Engine team runs cannot practically cover every possible configuration. Therefore, it is recommended that you ensure in a staging environment that your specific cluster configuration is upgradable using aks-engine upgrade before attempting this potentially destructive operation on your production cluster.
aks-engine upgrade is backwards compatible. If you deployed with aks-engine version 0.27.x, you can run upgrade with version 0.29.y. In fact, it is recommended that you use the latest available aks-engine version when running an upgrade operation. This will ensure that you get the latest available software and bug fixes in your upgraded cluster.

aks-engine upgrade will automatically re-generate your cluster configuration to best pair with the desired new version of Kubernetes, and/or the version of AKS Engine that is used to execute aks-engine upgrade. To use an example of both:
When you upgrade to (for example) Kubernetes 1.14 from 1.13, AKS Engine will automatically change your control plane configuration (e.g., coredns, metrics-server, kube-proxy) so that the cluster component configurations have a close, known-working affinity with 1.14.
When you perform an upgrade, even if it is a Kubernetes patch release upgrade such as 1.14.1 to 1.14.2, but you use a newer version of AKS Engine, a newer version of etcd (for example) may have been validated and configured as default since the original version of AKS Engine used to build the cluster was released. So, for example, without any explicit user direction, the newly upgraded cluster will now be running etcd v3.2.26 instead of v3.2.25. This is by design.
In summary, using aks-engine upgrade means you will freshen and re-pave the entire stack that underlies Kubernetes to reflect the best-known, recent implementation of Azure IaaS + OS + OS config + Kubernetes config.

Under the hood

During the upgrade, aks-engine successively visits virtual machines that constitute the cluster (first the master nodes, then the agent nodes) and performs the following operations:

Master nodes:

cordon the node and drain existing workloads
delete the VM
create new VM and install desired Kubernetes version
add the new VM to the cluster (custom annotations, labels and taints etc are retained automatically)
Agent nodes:
create new VM and install desired Kubernetes version
add the new VM to the cluster
evict any pods that might be scheduled onto this node by Kubernetes before copying custom node properties
copy the custom annotations, labels and taints of old node to new node.
cordon the node and drain existing workloads
delete the VM

Steps to run 1.15.11 -> 1.16.14 upgrade

install akse v0.55.4

wget https://github.com/Azure/aks-engine/releases/download/v0.55.4/aks-engine-v0.55.4-linux-amd64.tar.gz
tar -xvf aks-engine-v0.55.4-linux-amd64.tar.gz
cd aks-engine-v0.55.4-linux-amd64

Download apimode.json

azcopy copy "https://{{storageaccount}}.blob.core.chinacloudapi.cn/aks-engine/{{location}}/akse?{{SAS Token}}" "/home/vmadmin/akse" --recursive=true
cd /home/akse/akse

update configmap for flannel

kubectl edit cm kube-flannel-cfg -n kube-system
#(add "cniVersion": "0.2.0")

do Upgrade

aks-engine upgrade --azure-env AzureChinacloud \
--api-model _output/{{Resource Group}}/apimodel.json \
--location chinanorth2 \
--resource-group {{Resource Group}} \
--subscription-id {{subscription-id}} \
--upgrade-version 1.16.14 \
--client-id {{client-id}} \
--client-secret {{client-secret}}

update kubectl

 curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.16.14/bin/linux/amd64/kubectl
sudo cp kubectl /usr/bin/kubectl

【 Azure 】基于aks-engine的kubernetes集群升级