当前位置: 首页 > 工具软件 > nvidia-docker > 使用案例 >

nvidia-docker安装说明

叶冥夜
2023-12-01

本文转载至:
  https://docs.docker.com/engine/install/ubuntu/
  https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian

一、 安装docker,参考https://docs.docker.com/engine/install/ubuntu/

1.1 安装软件包以允许apt通过HTTPS使用存储库

sudo apt-get update && sudo apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

1.2 添加Docker的官方GPG密钥

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -

1.3 添加Docker apt存储库

sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

1.4 安装Docker CE

sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)

1.5 创建/etc/docker

sudo mkdir /etc/docker

1.6 设置Docker守护程序配置

cat <<EOF | sudo tee /etc/docker/daemon.json
{
   "default-runtime": "nvidia",
   "runtimes":{
       "nvidia":{
           "path":"nvidia-container-runtime",
           "runtimeArgs":[]
       }
   },
   "log-driver":"json-file",
   "log-opts":{
       "max-size":"200m",
       "max-file":"3"
   }
}
EOF

1.7 创建docker.service.d

sudo mkdir -p /etc/systemd/system/docker.service.d

1.8 重启Docker

sudo systemctl daemon-reload
sudo systemctl restart docker

1.9 设置Docker开机启动

sudo systemctl enable docker

二、 安装nvidia-docker

2.1 修改/etc/hosts添加下面内容

# nvidia.github.io
185.199.108.153 nvidia.github.io
185.199.109.153 nvidia.github.io
185.199.110.153 nvidia.github.io
185.199.111.153 nvidia.github.io

2.2 设置存储库和GPG密钥

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

2.3 添加experimental存储库

curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list

2.4 更新软件包清单

sudo apt-get update

2.5 安装nvidia-docker

sudo apt-get install -y nvidia-docker2

2.6 重启Docker

sudo systemctl restart docker

2.7 运行基本CUDA容器来测试是否有效

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
  • 控制台输出如下所示:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
 类似资料: