作者:张华 发表于:2023-03-01
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明
没有外网,所以配置了一个local custom镜像库,也使用了container-image-metadata-url进行配置,但是用juju创建lxd容器时还是说找不着image.
关于container-image-metadata-url的代码如下:
https://github.com/juju/juju/pull/8578
https://github.com/juju/juju/blob/juju-2.9.35/container/lxd/manager.go#L282-L284
也有一个好帖子:
https://discourse.charmhub.io/t/local-lxd-image-server/3929/5
1, 使用 juju创建一个focal的machine 0, 然后再machine 0上部署一个xenial的lxd容器。
juju add-model test
juju add-machine --series focal
juju model-config logging-config="<root>=DEBUG"
juju remove-application ceph-radosgw && juju deploy ceph-radosgw --series=xenial --to="lxd:0"
2, 在juju controller(juju ssh -m controller 0)与machine 0上运行下列iptables来模拟和cloud-images.ubuntu.com断网。这里我发现:
2023-03-01 07:58:21 INFO juju.cloudconfig userdatacfg_unix.go:613 Fetching agent: curl -sSf --connect-timeout 20 --noproxy "*" --insecure -o $bin/tools.tar.gz <[https://10.5.0.31:17070/model/deb85179-10a6-4877-88f7-012ef768d726/tools/2.9.38-ubuntu-amd64 https://252.0.31.1:17070/model/deb85179-10a6-4877-88f7-012ef768d726/tools/2.9.38-ubuntu-amd64]>
2023-03-01 07:59:03 INFO juju.container.lxd container.go:256 starting new container "juju-68d726-0-lxd-2" (image "ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz")
2023-03-01 07:59:03 DEBUG juju.container.lxd container.go:257 new container has profiles [default]
2023-03-01 07:59:42 DEBUG juju.container.lxd container.go:286 created container "juju-68d726-0-lxd-2", waiting for start...
dig cloud-images.ubuntu.com #185.125.190.37 and 185.125.190.40
juju ssh -m controller 0 -- sudo iptables -A OUTPUT -d 185.125.190.37 -j DROP
juju ssh -m controller 0 -- sudo iptables -A OUTPUT -d 185.125.190.40 -j DROP
cat << EOF |tee test.yaml
cloudinit-userdata: |
postruncmd:
- bash -c 'echo 10.5.0.126 quqi.com >> /etc/hosts'
- bash -c 'iptables -A OUTPUT -d 185.125.190.37 -j DROP'
- bash -c 'iptables -A OUTPUT -d 185.125.190.40 -j DROP'
EOF
juju model-config ./test.yaml
3, bastion上运行sstream-mirror将cloud-images.ubuntu.com中的xenial amd64镜像mirror了下来。
sudo apt -y install simplestreams -y
workdir=/home/ubuntu/simplestreams2
sudo sstream-mirror --keyring=/usr/share/keyrings/ubuntu-cloudimage-keyring.gpg --progress --max=1 --path=streams/v1/index.json https://cloud-images.ubuntu.com/releases/ $workdir 'arch=amd64' 'release~(xenial)' 'ftype~(lxd.tar.xz|squashfs|root.tar.xz|root.tar.gz|disk1.img|.json|.sjson)'
然后用nginx为它配置了https:
#https://goharbor.io/docs/2.6.0/install-config/configure-https/
openssl genrsa -out ca.key 4096
openssl req -x509 -new -nodes -sha512 -days 3650 -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=quqi.com" -key ca.key -out ca.crt
openssl genrsa -out quqi.com.key 4096
openssl req -sha512 -new -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=quqi.com" -key quqi.com.key -out quqi.com.csr
#complies with the Subject Alternative Name (SAN) and x509 v3 extension requirements to avoid 'x509: certificate relies on legacy Common Name field, use SANs instead'
cat > v3.ext <<-EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1=quqi.com
DNS.2=quqi
DNS.3=hostname
EOF
openssl x509 -req -sha512 -days 3650 -extfile v3.ext -CA ca.crt -CAkey ca.key -CAcreateserial -in quqi.com.csr -out quqi.com.crt
#for docker, the Docker daemon interprets .crt files as CA certificates and .cert files as client certificates.
openssl x509 -inform PEM -in quqi.com.crt -out quqi.com.cert
curl --resolve quqi.com:443:10.5.0.126 --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json
sudo cp ~/ca/ca.crt /usr/local/share/ca-certificates/ca.crt
sudo chmod 644 /usr/local/share/ca-certificates/ca.crt
sudo update-ca-certificates --fresh
curl --resolve quqi.com:443:10.5.0.126 https://quqi.com:443/streams/v1/index.json
$ cat /etc/nginx/sites-available/default
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name quqi.com;
ssl_certificate /home/ubuntu/ca/quqi.com.crt;
ssl_certificate_key /home/ubuntu/ca/quqi.com.key;
#ssl_protocols TLSv1.2;
ssl_prefer_server_ciphers on;
location / {
root /home/ubuntu/simplestreams2;
index index.html;
}
}
# 注意:由于上面使用了一个新目录/home/ubuntu/simplestreams2作为root,那需要将/etc/nginx/nginx.conf中添加'user root;'来避免权限问题
#curl --resolve quqi.com:443:10.5.0.126 --cacert ~/ca/ca.crt https://quqi.com:443/images/streams/v1/index.json
curl --resolve quqi.com:443:10.5.0.126 --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json
4, 配置juju中的container-image-metadata-url使用上面的https based local image mirror
juju model-config container-image-metadata-url=https://quqi.com:443
juju model-config image-metadata-url=https://quqi.com:443
5, juju controller由于访问local image mirror, 所以配置hosts与添加ca key
echo '10.5.0.126 quqi.com' >> /etc/hosts
curl --resolve quqi.com:443:10.5.0.126 --cacert ~/ca/ca.crt https://quqi.com:443/streams/v1/index.json
sudo cp ~/ca/ca.crt /usr/local/share/ca-certificates/ca.crt
sudo chmod 644 /usr/local/share/ca-certificates/ca.crt
sudo update-ca-certificates --fresh
curl --resolve quqi.com:443:10.5.0.126 https://quqi.com:443/streams/v1/index.json
6, 记得重新测试之前将machine 0上的image cache删除
juju ssh 0 -- sudo lxc image delete juju/xenial/amd64
juju remove-application ceph-radosgw
7, 重新测试
juju deploy ceph-radosgw --series=xenial --to="lxd:0"
sudo tail -f /var/log/juju/machine-0.log
能在machine 0的/var/log/juju/machine-0.log中观察下列日志:
2023-03-01 08:26:45 WARNING juju.worker.lxdprovisioner provisioner_task.go:1371 machine 0/lxd/3 failed to start: acquiring LXD image: no matching image found
2023-03-01 08:26:45 WARNING juju.worker.lxdprovisioner provisioner_task.go:1410 failed to start machine 0/lxd/3 (acquiring LXD image: no matching image found), retrying in 10s (10 more attempts)
在juju controller上有时能搜到quqi, 有时候又不能,奇怪.
2023-02-23 07:33:52 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "https://quqi.com:443/images/streams/v1/streams/v1/index.json": Get "https://quqi.com:443/images/streams/v1/streams/v1/index.json": dial tcp 49.234.171.74:443: i/o timeout while getting published images metadata from image-metadata-url
2023-03-01 08:52:56 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "https://quqi.com:443/streams/v1/index.json": Get "https://quqi.com:443/streams/v1/index.json": x509: certificate relies on legacy Common Name field, use SANs instead
juju controller上仍然能看到cloud-images.ubuntu.com
2023-03-01 08:34:54 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp 185.125.190.37:80: i/o timeout while getting published images metadata from default ubuntu cloud images
上面是使用来提供simplestreams, 我们现在换用glance中的image来提供simplestreams继续测试 (不确定是否这种只适用于创建juju controller, 还是说也可以用于VM/LXD创建,试一下)
mkdir -p ~/simplestreams/images
IMAGE_ID=26751c0e-4282-415e-b8dc-a7a21d2f781d
SERIES=xenial
juju metadata generate-image -d ~/simplestreams -i $IMAGE_ID -s $SERIES -r RegionOne -u $OS_AUTH_URL
然后修改/etc/nginx/sites-available/default将上面测试用的/home/ubuntu/simplestreams2改成/home/ubuntu/simplestreams, 重启nginx之后, 设置container-image-metadata-url (注意:此时后面链接多出了/images)
juju model-config container-image-metadata-url=https://quqi.com:443/images
juju model-config image-metadata-url=https://quqi.com:443/images
#注意下面的并不是由上两句形成的,而是由人工运行lxc命令(lxc remote add xxx)形成的,但即使有它也不 work
root@juju-4e4d8f-test-0:~# cat ~/snap/lxd/common/config/config.yml
default-remote: local
remotes:
images:
addr: https://images.linuxcontainers.org
protocol: simplestreams
public: true
local:
addr: unix://
public: false
test:
addr: https://quqi.com:443
protocol: simplestreams
public: true
aliases: {}
为保证测试环境干净,我也在controller与machine0上运行了下列命令:
systemctl restart jujud-machine-0.service
然后重复测试后,问题依旧, controller上看到下列日志:
2023-03-01 10:29:32 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "https://streams.canonical.com/juju/tools/streams/v1/index.sjson": Get "https://streams.canonical.com/juju/tools/streams/v1/index.sjson": dial tcp 185.125.190.37:443: i/o timeout
2023-03-01 10:29:36 INFO juju.state addmachine.go:505 new machine "0/lxd/11" has preferred addresses: private "", public ""
2023-03-01 10:29:37 WARNING juju.apiserver.instancemutater lxdprofilewatcher.go:206 unit ceph-radosgw/11 has no machine id, start watching when machine id assigned.
2023-03-01 10:29:41 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered index file has no data for cloud {stsstack http://10.230.19.53:5000/v3} not found while getting published images metadata from image-metadata-url
2023-03-01 10:30:11 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "http://cloud-images.ubuntu.com/releases/streams/v1/index2.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index2.sjson": dial tcp 185.125.190.40:80: i/o timeout
2023-03-01 10:30:41 WARNING juju.environs.simplestreams datasource.go:184 Got error requesting "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp 185.125.190.37:80: i/o timeout
2023-03-01 10:30:41 WARNING juju.apiserver.provisioner provisioninginfo.go:801 encountered "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": Get "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson": dial tcp 185.125.190.37:80: i/o timeout while getting published images metadata from default ubuntu cloud images
看样子和simplestreams类型无关。
即然与simplestreams类型无关,那我们将nginx再恢复之前的/home/ubuntu/simplestreams2
juju model-config container-image-metadata-url=https://quqi.com:443/
juju model-config image-metadata-url=https://quqi.com:443/
然后测试cloudinit-userdata, 这个是没问题的,可以作workaround
cat << EOF |tee cloudinit-userdata.yaml
cloudinit-userdata: |
postruncmd:
- echo '10.5.0.126 quqi.com' >> /etc/hosts
- if hostname |grep -qv lxd; then wget --tries=15 --retry-connrefused --timeout=15 --random-wait=on -O /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz https://quqi.com:443/server/releases/xenial/release-20211001/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz --no-check-certificate; wget --tries=15 --retry-connrefused --timeout=15 --random-wait=on -O /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64.squashfs https://quqi.com:443/server/releases/xenial/release-20211001/ubuntu-16.04-server-cloudimg-amd64.squashfs --no-check-certificate; fi
- sleep 30
- if hostname |grep -qv lxd; then lxc image import /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64-lxd.tar.xz /home/ubuntu/ubuntu-16.04-server-cloudimg-amd64.squashfs --alias juju/xenial/amd64; fi
EOF
juju model-config ./cloudinit-userdata.yaml
juju model-config cloudinit-userdata --format yaml
#juju model-config --reset cloudinit-userdata
注意:之前一直不work的原因是因为在postruncmd:后加了 | 的原因,找到答案的过程见下列的"调试cloud-init"一节。
最后发现用下面的是不work的:
cat << EOF |tee test.yaml
cloudinit-userdata: |
postruncmd: |
- echo '10.5.0.126 quqi.com' >> /etc/hosts
- echo 'test' > /home/ubuntu/cloud-init.txt
EOF
需要改成下面的:
cat << EOF |tee test.yaml
cloudinit-userdata: |
postruncmd:
- bash -c 'echo 10.5.0.126 quqi.com >> /etc/hosts'
- bash -c 'echo test > /home/ubuntu/cloud-init.txt'
EOF
下面的也不会work
cat << EOF |tee test.yaml
cloudinit-userdata: |
postruncmd: |
bash -c 'echo 10.5.0.126 quqi.com >> /etc/hosts'
bash -c 'echo test > /home/ubuntu/cloud-init.txt'
EOF
下面的更不会work, 会直接报:ERROR json: unsupported type: map[interface {}]interface {}’
cat << EOF |tee test.yaml
cloudinit-userdata: |
postruncmd:
bash -c 'echo 10.5.0.126 quqi.com >> /etc/hosts'
bash -c 'echo test > /home/ubuntu/cloud-init.txt'
EOF
其他调试方法如下:
juju add-model test
juju model-config ./test.yaml
juju model-config cloudinit-userdata --format yaml
juju model-config ssl-hostname-verification=false
juju add-machine --series focal
1, check cloud-init log: cloud-init collect-logs & tar -xf cloud-init.tar.gz
2, check cloud-init config: /etc/cloud/cloud.cfg
3, cloud-init is enabled: systemctl list-unit-files | grep cloud
4, /var/lib/cloud/instances/af2d721e-e38e-4937-81ad-7cc72a49c184/cloud-config.txt
试图排除https://bugs.launchpad.net/juju/+bug/1797168
juju add-model test2
juju model-config container-image-metadata-url=https://quqi.com:443/
juju model-config image-metadata-url=https://quqi.com:443/
juju model-config logging-config="<root>=DEBUG"
juju model-config ssl-hostname-verification=false
juju add-machine --series xenial
#一定要拷ca.crt到machine 0上(而不是controller 0)
juju scp -m m ~/ca/ca.crt 0:~/
juju ssh -m m 0 -- sudo cp /home/ubuntu/ca.crt /usr/local/share/ca-certificates/ca.crt
juju ssh -m m 0 -- sudo update-ca-certificates --fresh
juju add-machine --series xenial lxd:0
#juju remove-application ceph-radosgw && juju deploy ceph-radosgw --series=xenial --to="lxd:0"
NOTE: 一直不work的原因是将ca.crt拷贝到了controller 0,而是应该将它拷到machine 0
lxc端用cloud-images.ubuntu.com作default ,这个default不能replace,
# lxc remote list |grep releases
| ubuntu | https://cloud-images.ubuntu.com/releases | simplestreams | none | YES | YES | NO |
root@juju-4e4d8f-test-7:~# lxc remote set-url ubuntu https://quqi.com:443
Error: Remote ubuntu is static and cannot be modified
它只能添加,所以我添加了自己的也干脆将它设置成public类型的
lxc remote add test https://quqi.com:443 --protocol=simplestreams
lxc remote remove test & lxc remote add test https://quqi.com:443 --protocol=simplestreams --public
sudo snap set lxd daemon.debug=true
sudo systemctl reload snap.lxd.daemon
也要machine里设置了LXD_INSECURE_TLS=true(remote error: tls: protocol version not supported), 总之确保了使用test mirror (lxc launch test:16.04 i1)能正常运行。
vim /etc/systemd/system/snap.lxd.daemon.service
Environment=LXD_INSECURE_TLS=true
#或者去掉nginx中的ssl_protocols TLSv1.2也行
但是测试还是失败,machine端看到了日志:
2023-03-02 03:26:30 DEBUG juju.container.lxd manager.go:283 checking default image metadata sources
2023-03-02 03:27:51 WARNING juju.worker.lxdprovisioner provisioner_task.go:1371 machine 7/lxd/3 failed to start: acquiring LXD image: no matching image found
2023-03-02 03:27:51 WARNING juju.worker.lxdprovisioner provisioner_task.go:1410 failed to start machine 7/lxd/3 (acquiring LXD image: no matching image found), retrying in 10s (10 more attempts)
## 上报bug
最后报了一个lp bug - https://bugs.launchpad.net/juju/+bug/2008993