安装教程搬运自官网给出的帮助文档:
Installation Guide — NVIDIA Cloud Native Technologies documentation
在Ubuntu和Debian系统上安装docker,以下步骤可用于在Ubuntu LTS 16.04、18.04、20.4和Debian - Stretch、Buster发行版上设置NVIDIA容器工具包。
设置docker
注:这里可能会报“handshake error”错误,可以把下面涉及到的网站 https 改为 http 即可
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
设置nVidia-docker工具包
设置程序包资料档案库和GPG键
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
为了访问实验特性和候选发布,可能想要将实验分支添加到存储库列表中:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
更新软件包列表后安装nvidia-docker2软件包(和依赖项):
sudo apt-get update
sudo apt-get install -y nvidia-docker2
设置默认运行时后,重新启动Docker守护程序以完成安装:
sudo systemctl restart docker
此时,可以通过运行一个基本的CUDA容器来测试工作设置 (命令后面有相关介绍)
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
这应该会产生如下所示的控制台输出:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
这就说明安装成功了
docker hub网站搜索"nvidia/cuda":Docker Hub
docker run --gpus all -v /home/xxx/:/DOCKER_PATH --name NAME -it nvidia/IMAGE_NAME bash
--gpus 挂载gpu,一般为all,在容器中挂载所有GPU
-v 本地目录映射到docker容器里,-v 本地目录:镜像目录
--name 容器名
-it 加载镜像
bash 打开docker命令行
docker exec -it NAME bash
docker ps -a
docker images
docker rm DOCKER_NAME
docker rmi IMAGE_NAME
systemctl start/stop/restart docker