当前位置: 首页 > 工具软件 > nvidia-docker > 使用案例 >

nVidia-docker 安装与创建容器

左仰岳
2023-12-01

1、nVidia-docker安装:

安装教程搬运自官网给出的帮助文档:

Installation Guide — NVIDIA Cloud Native Technologies documentation

在Ubuntu和Debian系统上安装docker,以下步骤可用于在Ubuntu LTS 16.04、18.04、20.4和Debian - Stretch、Buster发行版上设置NVIDIA容器工具包。

设置docker

        注:这里可能会报“handshake error”错误,可以把下面涉及到的网站 https 改为 http 即可

curl https://get.docker.com | sh \
  && sudo systemctl --now enable docker

设置nVidia-docker工具包

        设置程序包资料档案库和GPG键

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

        为了访问实验特性和候选发布,可能想要将实验分支添加到存储库列表中:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
         sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
         sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

        更新软件包列表后安装nvidia-docker2软件包(和依赖项):

sudo apt-get update
sudo apt-get install -y nvidia-docker2

         设置默认运行时后,重新启动Docker守护程序以完成安装:

sudo systemctl restart docker

        此时,可以通过运行一个基本的CUDA容器来测试工作设置 (命令后面有相关介绍)

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

        这应该会产生如下所示的控制台输出:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 这就说明安装成功了

2、容器下载

docker hub网站搜索"nvidia/cuda":Docker Hub

3、常用命令

创建容器:

docker run --gpus all -v /home/xxx/:/DOCKER_PATH --name NAME -it nvidia/IMAGE_NAME bash

--gpus 挂载gpu,一般为all,在容器中挂载所有GPU

-v 本地目录映射到docker容器里,-v 本地目录:镜像目录

--name 容器名

-it 加载镜像

bash 打开docker命令行

继续运行容器

docker exec -it NAME bash

容器列表

docker ps -a

docker镜像列表

docker images

删除docker容器

docker rm DOCKER_NAME

删除docker镜像

docker rmi IMAGE_NAME

容器的起/停/重启

systemctl start/stop/restart docker

 类似资料: