当前位置: 首页 > 工具软件 > Deepo > 使用案例 >

Ubuntu 18.04+docker+nvidia-docker+deepo安装及踩坑记录

陈德泽
2023-12-01
Ubuntu 18.04安装docker踩坑记录
(根据docker官网

1.Update the apt package index:
$ sudo apt-get update

2.Install packages to allow apt to use a repository over HTTPS:
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
software-properties-common

3. Add Docker’s o cial GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
[注]:这里要是不OK的话,就用手机开热点,就显示OK。

Verify that you now have the key with the ngerprint:
$ sudo apt-key fingerprint 0EBFCD88
显示:
pub   rsa4096 2017-02-22 [SCEA]
      9DC8 5822 9FC7 DD38 854A  E2D8 8D81 803C 0EBF CD88
uid           [ unknown] Docker Release (CE deb) <docker@docker.com>
sub   rsa4096 2017-02-22 [S]

4.Use the following command to set up the stable repository:
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

[注]:
若在以上的第3步或第4步出现docker官方源安装不上的情况,W:Failed to fetch https://apt.dockerproject.org/repo/dists/ubun
则,
$sudo gedit /etc/apt/sources.list
将docker官方源删掉,换作阿里源,并:
$curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
$sudo add-apt-repository \ 
	"deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu \
	$(lsb_release -cs \
	stable)"
若不修改以上三步,则每次apt update就会报错:
Failed to fetch https://apt.dockerproject.org/repo/dists/ubun

5.Update the apt package index:
$ sudo apt-get update
[注]:这里如果一直很慢的话,换手机开热点就好了

6.Install the latest version of Docker CE, or go to the next step to install a specic version:
$ sudo apt-get install docker-ce
[注]:这里要是出现:
update-alternatives: using /usr/bin/dockerd-ce to provide /usr/bin/dockerd (dockerd) in auto mode
docker.service is a disabled or a static unit, not starting it.
则很有可能运行下一步的时候报错,若不出现这两行就没关系。

7.Verify that Docker CE is installed correctly by running the hello-world image:
$ sudo docker run hello-world
这里若是出现:
Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/
则运行正常,安装docker成功。
若是出现:
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See 'docker run --help'.
则是因为第6步提到的问题,此时已确定docker本身已经安装正常,原因是因为docker服务没有启动,所以在相应的/var/run/ 路径下找不到docker的进程。因此需要执行 service docker start 命令,启动docker服务,此时进程启动成功,问题解决:
解决方法为:
$ service docker start
再运行
$ sudo docker run hello-world
会显示:
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/
到这里就安装好啦!

8.但是以后每次运行docker都要sudo,所以继续linux postinstall以允许非特权用户运行docker命令以及其他可选的配置步骤:
有两个方法:
方法一:
$sudo groupadd docker	#添加docker用户组
$sudo gpasswd -a $USER docker	#将登陆用户加入到docker用户组中
$newgrp docker	#更新用户组

方法二:
$ sudo groupadd docker	#添加docker用户组
$ sudo usermod -aG docker $USER		#将登陆用户加入到docker用户组中
log out/log in



Ubuntu 18.04安装NVIDIA-docker踩坑记录

1.
$curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
$curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | \  sudo tee /etc/apt/sources.list.d/nvidia-docker.list$sudo apt-get update
2.安装nvidia-docker2软件包并重新加载docker守护程序配置

$sudo apt-get install nvidia-docker2

$sudo pkill -SIGHUP dockerd

$docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

 

[注]:运行上一步若报错:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=4745 /var/lib/docker/overlay2/71416932ef96acaad33ced3d8979438f31000da59e36b5e02952964915a0c964/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknow

当显卡驱动安装好的前提下(用指令$nvidia-smi来显示显卡驱动),则换成运行下步,指定安装的cuda版本:

$sudo nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi

则会显示:

Unable to find image 'nvidia/cuda:9.0-devel' locally

9.0-devel: Pulling from nvidia/cuda

18d680d61657: Pull complete

0addb6fece63: Pull complete

78e58219b215: Pull complete

eb6959a66df2: Pull complete

6ef1ff668c93: Pull complete

f5f8f0544aa2: Pull complete

3d28d96eb352: Pull complete

1b48d63763c4: Pull complete

70fb71aabe87: Pull complete

Digest: sha256:7d95426c00962ef3352151ccf425eb0a446108589585995c2277e2201390918c

Status: Downloaded newer image for nvidia/cuda:9.0-devel

Sat Nov 24 10:06:44 2018

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 390.48 Driver Version: 390.48 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Quadro M1200 Off | 00000000:01:00.0 Off | N/A |

| N/A 37C P8 N/A / N/A | 398MiB / 4043MiB | 1% Default |

+-------------------------------+----------------------+----------------------+

 

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

表示成功。

 

Ubuntu 18.04安装deepo

1.安装docker和nvidia-docker

2.从docker hub上获取所有的镜像(比较大):

$docker pull ufoym/deepo #速度慢

$docker pull registry.docker-cn.com/ufoym/deepo #采用加速

显示:

Using default tag: latest

latest: Pulling from ufoym/deepo

18d680d61657: Already exists

0addb6fece63: Already exists

78e58219b215: Already exists

eb6959a66df2: Already exists

6ef1ff668c93: Already exists

f5f8f0544aa2: Already exists

3d28d96eb352: Already exists

1b48d63763c4: Already exists

70fb71aabe87: Already exists

a547457bef7c: Pull complete

ee476afe1991: Pull complete

Digest: sha256:fe59328e10fd07db625470495b6e40887145181182c84e7b318073443a9510f0

Status: Downloaded newer image for ufoym/deepo:latest

3.使deepo能够从docker容器中使用GPU:
$nvidia-docker run --rm ufoym/deepo nvidia-smi
显示:
Sat Nov 24 10:46:57 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1200        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P0    N/A /  N/A |    399MiB /  4043MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
4.交互式shell,创建一个容器,当退出之后,容器仍然存在:
$nvidia-docker run -it ufoym/deepo bash
$nvidia-docker run --rm ufoym/deepo nvidia-smi
显示:
Sat Nov 24 10:46:57 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1200        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   41C    P0    N/A /  N/A |    399MiB /  4043MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
 
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
4.交互式shell,创建一个容器,当退出之后,容器仍然存在:
$nvidia-docker run -it ufoym/deepo bash
 类似资料: