이런 경우 docker, 그 중에서도 nvidia-docker가 좋은 해결책이 될 수 있습니다. 가령 아래의 경우를 보면, host 서버에는 아예 nvcc v7.5.17이 설치되어 있습니다.
root@minsky:/data# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:31:50_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
하지만 그 host에서 돌아가는 docker container 속에서는 nvcc v8.0.44를 운용하는 것을 보실 수 있습니다.
root@minsky:/data# docker run --rm bsyu/nvcc:ppc64le-xenial nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sat_Sep__3_19:09:38_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
이렇게 편리한 nvidia-docker를 사용하기 위한 절차를 step-by-step으로 알아보겠습니다.
먼저, 기본 docker 엔진 설치가 필요합니다. 다음과 같이 ppc64le용 docker의 APT repository를 /etc/apt/sources.list.d 에 등록합니다.
root@minsky:~# echo deb http://ftp.unicamp.br/pub/ppc64el/ubuntu/16_04/docker-1.12.6-ppc64el/ xenial main > /etc/apt/sources.list.d/xenial-docker.list
root@minsky:~# apt-get update
다음과 같이 설치하고, docker service 시작합니다.
root@minsky:~# apt-get install docker-engine
root@minsky:~# service docker restart
이어서, nvidia-docker를 source에서 build합니다.
root@minsky:~# cd /data
root@minsky:/data# git clone https://github.com/NVIDIA/nvidia-docker.git
root@minsky:/data# cd nvidia-docker
root@minsky:/data/nvidia-docker# git fetch --all
root@minsky:/data/nvidia-docker# git checkout ppc64le
root@minsky:/data/nvidia-docker# ls
centos centos-7 LICENSE mk samples ubuntu ubuntu-16.04
centos-6 CLA Makefile README.md tools ubuntu-14.04
root@minsky:/data/nvidia-docker# make deb
이것이 끝나면 tools/dist 밑에 설치 가능한 nvidia-docker debian package가 만들어집니다. 그걸 dpkg 명령으로 설치합니다.
root@minsky:/data/nvidia-docker# dpkg -i tools/dist/nvidia-docker_1.0.0~rc.3-1_ppc64el.deb
nvidia-docker image 명령으로 보면, 몇몇 기본 image들이 보입니다.
root@minsky:/data/nvidia-docker# nvidia-docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia-docker deb 332eaa8c9f9d 3 minutes ago 430.1 MB
nvidia-docker build 8cbc22512d15 5 minutes ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
ppc64le/ubuntu를 docker run 하면, 기본으로 latest라는 tag가 붙은 docker image를 run 합니다. 그러나 기존의 image 중에는 ppc64le/ubuntu:latest가 없으므로, 그 이미지를 새로 docker hub에서 download 해온 뒤 수행합니다.
root@minsky:/data/nvidia-docker# docker run -it ppc64le/ubuntu bash
Unable to find image 'ppc64le/ubuntu:latest' locally
latest: Pulling from ppc64le/ubuntu
0847857e6401: Pull complete
f8c18c152457: Pull complete
8643975d001d: Pull complete
d5802da4b3a0: Pull complete
fe172ed92137: Pull complete
Digest: sha256:5349f00594c719455f2c8e6f011b32758dcd326d8e225c737a55c15cf3d6948c
Status: Downloaded newer image for ppc64le/ubuntu:latest
이제 docker image의 bash 안으로 들어 왔습니다.
root@ba07ff7529b3:/# df -h
Filesystem Size Used Avail Use% Mounted on
none 845G 743G 60G 93% /
tmpfs 256G 0 256G 0% /dev
tmpfs 256G 0 256G 0% /sys/fs/cgroup
/dev/sda2 845G 743G 60G 93% /etc/hosts
shm 64M 0 64M 0% /dev/shm
root@ba07ff7529b3:/# ls
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
root@ba07ff7529b3:/# uname -a
Linux ba07ff7529b3 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:05:18 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
Ubuntu의 minimum image이미지라서 ifconfig 명령조차 없습니다.
root@ba07ff7529b3:/# ifconfig
bash: ifconfig: command not found
root@ba07ff7529b3:/# exit
docker ps -a로 container의 상황을 볼 수 있습니다.
root@minsky:/data/nvidia-docker# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ba07ff7529b3 ppc64le/ubuntu "bash" About a minute ago Exited (127) 9 seconds ago small_ride
이제 직접 docker image를 build해 봅니다. 기본 내용은 아래 URL을 보고 따라한 것입니다.
https://www.ibm.com/developerworks/library/d-docker-on-power-linux-platform/
Ubuntu 16.04 Xenial ppc64le의 docker image를 build합니다. 그러기 위해 먼저 debootstrap package를 설치하고, 아래처럼 debootstrap.sh 스크립트를 download 받습니다.
root@minsky:/data# apt-get install -y debootstrap
root@minsky:/data# curl -o debootstrap.sh https://raw.githubusercontent.com/docker/docker/master/contrib/mkimage/debootstrap
root@minsky:/data# chmod a+x ./debootstrap.sh
아래와 같이 Xenial의 main, universe, multiverse, restricted 4개의 repository를 끼고 build하는 것으로 스크립트를 수행합니다.
root@minsky:/data# ./debootstrap.sh ubuntu --components=main,universe,multiverse,restricted xenial
수행이 끝나면 바로 아래 ubuntu directory에 OS 이미지를 위한 directory들이 생깁니다.
root@minsky:/data# ls ubuntu
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
이제 이 directory를 tar로 말면서 docker에 import 합니다.
root@minsky:/data# tar -C ubuntu -c . | docker import - ubuntu:16.04
sha256:09621ebd4cfd280af86ef61e2c5a41e8ef4e0081d6ec51203dba1fceaf69e625
Import된 docker image를 확인합니다.
root@minsky:/data# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 09621ebd4cfd 31 seconds ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d About an hour ago 430.1 MB
nvidia-docker build 8cbc22512d15 About an hour ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 12 weeks ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
여러가지로 tag를 합니다. 특히 latest tag를 꼭 합니다. 그러지 않으면 그 이미지를 부를 때마다, 같은 이름의 latest tag가 달린 이미지를 인터넷을 통해 docker hub에서 download 하려 들겁니다.
root@minsky:/data# docker tag ubuntu:16.04 ubuntu:xenial
root@minsky:/data# docker tag ubuntu:16.04 ubuntu:latest
root@minsky:/data# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 09621ebd4cfd About a minute ago 234.3 MB
ubuntu latest 09621ebd4cfd About a minute ago 234.3 MB
ubuntu xenial 09621ebd4cfd About a minute ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d About an hour ago 430.1 MB
nvidia-docker build 8cbc22512d15 About an hour ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 12 weeks ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
각 docker image가 ppc64le용인지 x86용인지는 아래와 같이 확인 가능합니다.
root@minsky:/data# docker inspect ubuntu | grep -i arch
"Architecture": "ppc64le",
이제 이렇게 만들어진 ubuntu:16.04 이미지를 기반으로 nvidia-docker 이미지를 build 합니다. 먼저, 다음과 같이 몇 개의 dockerfile들을 적절히 편집합니다.
root@minsky:/data# cd nvidia-docker
root@minsky:/data/nvidia-docker# vi ./ubuntu-16.04/cuda/8.0/runtime/Dockerfile.ppc64le
#FROM ppc64le/ubuntu:16.04
FROM ubuntu:16.04
...
아래에서는 인터넷에서 download 받을 cudnn-8.0-linux-ppc64le-v5.1.tgz 파일의 sha256sum 값이 Dockerfile.ppc64le 속에 이미 들어가 있는 것과 실제의 것이 맞지 않아서 생기는 error를 막기 위해, 편집하여 바꾸는 작업니다.
root@minsky:/data/nvidia-docker# sha256sum cudnn-8.0-linux-ppc64le-v5.1.tgz
663aac0328f821d90ae9c74ee43e90751706546c2ce769ea9c96f92864300af6 cudnn-8.0-linux-ppc64le-v5.1.tgz
root@minsky:/data/nvidia-docker# vi ./ubuntu-16.04/cuda/8.0/runtime/cudnn5/Dockerfile.ppc64le
...
RUN CUDNN_DOWNLOAD_SUM=663aac0328f821d90ae9c74ee43e90751706546c2ce769ea9c96f92864300af6 && \
#RUN CUDNN_DOWNLOAD_SUM=51f698d468401cef2e3e2ef9bb557bd57cbeb4dca895d1d1ae8a751d090bbe39 && \
root@minsky:/data/nvidia-docker# vi ./ubuntu-16.04/cuda/8.0/devel/cudnn5/Dockerfile.ppc64le
...
RUN CUDNN_DOWNLOAD_SUM=663aac0328f821d90ae9c74ee43e90751706546c2ce769ea9c96f92864300af6 && \
#RUN CUDNN_DOWNLOAD_SUM=51f698d468401cef2e3e2ef9bb557bd57cbeb4dca895d1d1ae8a751d090bbe39 && \
이제 make로 build 합니다.
root@minsky:/data/nvidia-docker# make cuda OS=ubuntu-16.04
수행이 끝나면 다음과 같이 cuda라는 이름의 이미지가 여러개 import 되어 있습니다. runtime용과 develop용, 그리고 cudnn이 있는 것/없는 것 등의 구분입니다.
root@minsky:/data/nvidia-docker# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
cuda 8.0-cudnn5-devel d8d0da2fbdf2 7 minutes ago 1.895 GB
cuda cudnn d8d0da2fbdf2 7 minutes ago 1.895 GB
cuda cudnn-devel d8d0da2fbdf2 7 minutes ago 1.895 GB
cuda 8.0 dc3faec17c11 9 minutes ago 1.726 GB
cuda 8.0-devel dc3faec17c11 9 minutes ago 1.726 GB
cuda devel dc3faec17c11 9 minutes ago 1.726 GB
cuda latest dc3faec17c11 9 minutes ago 1.726 GB
cuda 8.0-cudnn5-runtime 8a3b0a60e741 15 minutes ago 942.2 MB
cuda cudnn-runtime 8a3b0a60e741 15 minutes ago 942.2 MB
cuda 8.0-runtime 8e9763b6296f 17 minutes ago 844.9 MB
cuda runtime 8e9763b6296f 17 minutes ago 844.9 MB
ubuntu 16.04 09621ebd4cfd 4 days ago 234.3 MB
ubuntu latest 09621ebd4cfd 4 days ago 234.3 MB
ubuntu xenial 09621ebd4cfd 4 days ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d 4 days ago 430.1 MB
nvidia-docker build 8cbc22512d15 4 days ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 3 months ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
필요에 따라 tagging 합니다.
root@minsky:/data/nvidia-docker# docker tag cuda:8.0-cudnn5-devel cuda8-cudnn5-devel:cudnn5-devel
root@minsky:/data/nvidia-docker# docker tag cuda:8.0-cudnn5-devel cuda8-cudnn5-devel:latest
root@minsky:/data/nvidia-docker# docker tag cuda:8.0-runtime cuda8-runtime:latest
root@minsky:/data/nvidia-docker# docker tag cuda:cudnn-runtime cuda8-cudnn5-runtime:latest
root@minsky:/data/nvidia-docker# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
cuda8-cudnn5-devel cudnn5-devel d8d0da2fbdf2 12 minutes ago 1.895 GB
cuda8-cudnn5-devel latest d8d0da2fbdf2 12 minutes ago 1.895 GB
cuda 8.0-cudnn5-devel d8d0da2fbdf2 12 minutes ago 1.895 GB
cuda cudnn d8d0da2fbdf2 12 minutes ago 1.895 GB
cuda cudnn-devel d8d0da2fbdf2 12 minutes ago 1.895 GB
cuda 8.0 dc3faec17c11 13 minutes ago 1.726 GB
cuda 8.0-devel dc3faec17c11 13 minutes ago 1.726 GB
cuda devel dc3faec17c11 13 minutes ago 1.726 GB
cuda latest dc3faec17c11 13 minutes ago 1.726 GB
cuda8-cudnn5-runtime latest 8a3b0a60e741 20 minutes ago 942.2 MB
cuda 8.0-cudnn5-runtime 8a3b0a60e741 20 minutes ago 942.2 MB
cuda cudnn-runtime 8a3b0a60e741 20 minutes ago 942.2 MB
cuda8-runtime latest 8e9763b6296f 22 minutes ago 844.9 MB
cuda 8.0-runtime 8e9763b6296f 22 minutes ago 844.9 MB
cuda runtime 8e9763b6296f 22 minutes ago 844.9 MB
ubuntu 16.04 09621ebd4cfd 4 days ago 234.3 MB
ubuntu latest 09621ebd4cfd 4 days ago 234.3 MB
ubuntu xenial 09621ebd4cfd 4 days ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d 4 days ago 430.1 MB
nvidia-docker build 8cbc22512d15 4 days ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 3 months ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
다음과 같이 각 nvidia-docker image들의 차이를 확인해 볼 수 있습니다.
root@minsky:/data/nvidia-docker# docker run --rm cuda nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sat_Sep__3_19:09:38_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
root@minsky:/data/nvidia-docker# docker run -it cuda bash
root@a93070ccdc0d:/# which nvcc
/usr/local/cuda/bin/nvcc
root@a93070ccdc0d:/# ls -l /usr/local/cuda/bin
total 61648
-rwxr-xr-x 1 root root 175952 Sep 14 21:36 bin2c
lrwxrwxrwx 1 root root 4 Sep 14 21:40 computeprof -> nvvp
drwxr-xr-x 2 root root 4096 Jan 31 02:18 crt
-rwxr-xr-x 1 root root 9746984 Sep 14 21:36 cuda-gdb
-rwxr-xr-x 1 root root 500841 Sep 14 21:36 cuda-gdbserver
-rwxr-xr-x 1 root root 297576 Sep 14 21:36 cuda-memcheck
-rwxr-xr-x 1 root root 4581048 Sep 14 21:36 cudafe
-rwxr-xr-x 1 root root 4105352 Sep 14 21:36 cudafe++
-rwxr-xr-x 1 root root 699528 Sep 14 21:36 cuobjdump
-rwxr-xr-x 1 root root 245696 Sep 14 21:36 fatbinary
-rwxr-xr-x 1 root root 1108824 Sep 14 21:36 gpu-library-advisor
-rwxr-xr-x 1 root root 303928 Sep 14 21:36 nvcc
-rw-r--r-- 1 root root 411 Sep 14 21:36 nvcc.profile
-rwxr-xr-x 1 root root 16178272 Sep 14 21:36 nvdisasm
-rwxr-xr-x 1 root root 8126880 Sep 14 21:36 nvlink
-rwxr-xr-x 1 root root 8805704 Sep 14 21:36 nvprof
-rwxr-xr-x 1 root root 204712 Sep 14 21:36 nvprune
-rwxr-xr-x 1 root root 8015368 Sep 14 21:36 ptxas
root@edf46f371b00:/# find / -name libcudnn*
root@edf46f371b00:/#
여기엔 cudnn library들이 없는 것을 보셨습니다. 이제 cudnn이 들어 있는 develop용 image를 보시겠습니다.
root@minsky:/data/nvidia-docker# docker run --rm cuda8-cudnn5-devel nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sat_Sep__3_19:09:38_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
root@minsky:/data/nvidia-docker# docker run -it cuda8-cudnn5-devel bash
root@54c686bbec15:/# find / -name libcudnn*
/usr/local/cuda-8.0/targets/ppc64le-linux/lib/libcudnn.so.5.1.10
/usr/local/cuda-8.0/targets/ppc64le-linux/lib/libcudnn.so.5
/usr/local/cuda-8.0/targets/ppc64le-linux/lib/libcudnn_static.a
/usr/local/cuda-8.0/targets/ppc64le-linux/lib/libcudnn.so
이제 간단한 CUDA application용 docker image를 build 해보겠습니다. 먼저 다음과 같이 dockerfile을 만듭니다. 아래에서 base로 사용하는 것은 위에서 만든 cuda8-cudnn5-devel 이미지를 bsyu/cuda8-cudnn5-devel:cudnn5-devel 로 tagging한 것입니다. 거기에 CUDA를 설치하고 cuda sample 중 simpleP2P를 컴파일해서 넣겠습니다.
root@minsky:/data/mydocker# vi dockerfile.p2p
FROM bsyu/cuda8-cudnn5-devel:cudnn5-devel
# RUN executes a shell command
# You can chain multiple commands together with &&
# A \ is used to split long lines to help with readability
# This particular instruction installs the source files
# for deviceQuery by installing the CUDA samples via apt
RUN apt-get update && apt-get install -y cuda && \
rm -rf /var/lib/apt/lists/*
# set the working directory
WORKDIR /usr/local/cuda/samples/0_Simple/simpleP2P
RUN make
# CMD defines the default command to be run in the container
# CMD is overridden by supplying a command + arguments to
# `docker run`, e.g. `nvcc --version` or `bash`
CMD ./simpleP2P
위와 같은 dockerfile.p2p로 build를 합니다.
root@minsky:/data/mydocker# docker build -t bsyu/p2p:ppc64le-xenial -f dockerfile.p2p .
Build가 끝나면 다음과 같이 약 2.77GB의 꽤 큰 docker image가 생긴 것을 보실 수 있습니다.
root@minsky:/data/mydocker# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bsyu/p2p ppc64le-xenial c307ae42d1aa About a minute ago 2.77 GB
registry latest 781e109ba95f 26 hours ago 612.6 MB
ubuntu/xenial gevent 4ce0e6ba8a69 26 hours ago 282.5 MB
127.0.0.1/ubuntu-xenial gevent 4ce0e6ba8a69 26 hours ago 282.5 MB
localhost:5000/ubuntu-xenial gevent 4ce0e6ba8a69 26 hours ago 282.5 MB
bsyu/cuda8-cudnn5-devel cudnn5-devel d8d0da2fbdf2 28 hours ago 1.895 GB
bsyu/cuda 8.0-devel dc3faec17c11 28 hours ago 1.726 GB
bsyu/ppc64le cuda8.0-devel dc3faec17c11 28 hours ago 1.726 GB
cuda 8.0 dc3faec17c11 28 hours ago 1.726 GB
cuda 8.0-devel dc3faec17c11 28 hours ago 1.726 GB
cuda devel dc3faec17c11 28 hours ago 1.726 GB
cuda latest dc3faec17c11 28 hours ago 1.726 GB
cuda8-cudnn5-runtime latest 8a3b0a60e741 29 hours ago 942.2 MB
cuda 8.0-cudnn5-runtime 8a3b0a60e741 29 hours ago 942.2 MB
cuda cudnn-runtime 8a3b0a60e741 29 hours ago 942.2 MB
cuda8-runtime latest 8e9763b6296f 29 hours ago 844.9 MB
cuda 8.0-runtime 8e9763b6296f 29 hours ago 844.9 MB
cuda runtime 8e9763b6296f 29 hours ago 844.9 MB
ubuntu 16.04 09621ebd4cfd 5 days ago 234.3 MB
ubuntu latest 09621ebd4cfd 5 days ago 234.3 MB
ubuntu xenial 09621ebd4cfd 5 days ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d 5 days ago 430.1 MB
nvidia-docker build 8cbc22512d15 5 days ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 3 months ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 3 months ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
이를 기존처럼 docker run 명령으로 수행하면 다음과 같이 CUDA error가 납니다.
root@minsky:/data/mydocker# docker run --rm bsyu/p2p:ppc64le-xenial
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA error at simpleP2P.cu:63 code=30(cudaErrorUnknown) "cudaGetDeviceCount(&gpu_n)"
이제 nvidia-docker가 사용될 차례입니다. nvidia-docker는 docker에서 CUDA를 쓸 수 있게 해주는 일종의 wrapper, 또는 plugin 같은 것으로 보시면 됩니다. 사용법은 동일하며, 위에서 error가 나던 것이 이제 제대로 수행되는 것을 보실 수 있습니다.
root@minsky:/data/mydocker# nvidia-docker run --rm bsyu/p2p:ppc64le-xenial
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 4
> GPU0 = "Tesla P100-SXM2-16GB" IS capable of Peer-to-Peer (P2P)
> GPU1 = "Tesla P100-SXM2-16GB" IS capable of Peer-to-Peer (P2P)
> GPU2 = "Tesla P100-SXM2-16GB" IS capable of Peer-to-Peer (P2P)
> GPU3 = "Tesla P100-SXM2-16GB" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU1) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU2) : No
> Peer access from Tesla P100-SXM2-16GB (GPU0) -> Tesla P100-SXM2-16GB (GPU3) : No
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU0) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU2) : No
> Peer access from Tesla P100-SXM2-16GB (GPU1) -> Tesla P100-SXM2-16GB (GPU3) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU0) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU1) : No
> Peer access from Tesla P100-SXM2-16GB (GPU2) -> Tesla P100-SXM2-16GB (GPU3) : Yes
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU0) : No
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU1) : No
> Peer access from Tesla P100-SXM2-16GB (GPU3) -> Tesla P100-SXM2-16GB (GPU2) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla P100-SXM2-16GB (GPU0) supports UVA: Yes
> Tesla P100-SXM2-16GB (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 32.91GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed
P2P 대역폭 32.91GB/sec의 위엄... NVLink라서 행복합니다... PCIe Gen3에서는 기껏해야 8GB/sec 못 넘습니다...
해당 docker image의 bash 속으로 들어가서 nvidia-smi 명령도 수행해 봅니다.
root@minsky:/data/mydocker# nvidia-docker run --rm -ti bsyu/p2p:ppc64le-xenial bash
oot@d4770bd8ec53:/usr/local/cuda-8.0/samples/0_Simple/simpleP2P# nvidia-smi -l 3
Wed Feb 1 07:26:39 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.107 Driver Version: 361.107 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 |
| N/A 26C P0 28W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 |
| N/A 29C P0 31W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... On | 0006:01:00.0 Off | 0 |
| N/A 25C P0 30W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... On | 0007:01:00.0 Off | 0 |
| N/A 27C P0 29W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
docker container 속에서 보면 network 환경은 다음과 같습니다. 기본으로 주어진 172.17.0.3가 할당된 것을 보실 수 있습니다.
root@2a663f3cd0f5:/usr/local/cuda-8.0/samples/0_Simple/simpleP2P# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:ac:11:00:03
inet addr:172.17.0.3 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:acff:fe11:3/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:578 (578.0 B) TX bytes:508 (508.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Host에서 보면 docker0라는 interface에 172.17.0.1이 할당되어 있습니다.
root@minsky:~# ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:16:b1:40:08
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:16ff:feb1:4008/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:242445 errors:0 dropped:0 overruns:0 frame:0
TX packets:666734 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9924007 (9.9 MB) TX bytes:2676663029 (2.6 GB)
enP5p7s0f0 Link encap:Ethernet HWaddr 70:e2:84:14:19:25
inet addr:172.18.229.115 Bcast:172.18.229.255 Mask:255.255.255.0
inet6 addr: fe80::72e2:84ff:fe14:1925/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:10039400 errors:0 dropped:160 overruns:0 frame:0
TX packets:1471125 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4229498546 (4.2 GB) TX bytes:1221935620 (1.2 GB)
Interrupt:205
다시 docker container에서 외부의 다른 서버로 ssh를 해보면, docker container는 독자적인 IP를 가지는 것이 아니라 host의 IP를 그대로 유지하는 것을 아래와 같이 보실 수 있습니다.
root@901ee2ecf38a:/usr/local/cuda-8.0/samples/0_Simple/simpleP2P# ssh test@172.18.229.117
test@k8002:~$ who
test hvc0 2017-01-09 17:05
test tty1 2017-01-11 14:49
test pts/0 2017-02-01 16:51 (172.18.229.115)
이제 public docker hub ( https://hub.docker.com ) 에 login하여 만든 이미지를 push, pull 해보겠습니다. ID/passwd는 따로 web browser를 통해 등록해두셔야 합니다.
root@minsky:/data/registry_volume# docker login --username=bsyu
Password:
Login Succeeded
User name이 bsyu로 되어 있으므로, 기존 image를 docker hub에 올리려면 앞에 bsyu/ 를 붙여 tagging을 해주어야 합니다.
root@minsky:/data/registry_volume# docker tag cuda8-cudnn5-devel:latest bsyu/cuda8-cudnn5-devel:cudnn5-devel
이제 push하면 됩니다.
root@minsky:/data/registry_volume# docker push bsyu/cuda8-cudnn5-devel:cudnn5-devel
The push refers to a repository [docker.io/bsyu/cuda8-cudnn5-devel]
c0fe73e43621: Pushed
4ce979019d1d: Pushed
724befd94678: Pushed
84f99f1bf79b: Pushed
7f7c1dccec82: Pushed
5b8880a35736: Pushed
41b97cb9a404: Pushed
08f34ce6b3fb: Pushed
cudnn5-devel: digest: sha256:c463051a8f78430d9de187386cca68294e2826830f085c677e5b20e70caeaf3f size: 2003
root@minsky:/data/registry_volume# docker tag cuda:8.0-devel bsyu/ppc64le:cuda8.0-devel
root@minsky:/data/registry_volume# docker push bsyu/ppc64le:cuda8.0-devel
The push refers to a repository [docker.io/bsyu/ppc64le]
724befd94678: Mounted from bsyu/cuda
84f99f1bf79b: Mounted from bsyu/cuda
7f7c1dccec82: Mounted from bsyu/cuda
5b8880a35736: Mounted from bsyu/cuda
41b97cb9a404: Mounted from bsyu/cuda
08f34ce6b3fb: Mounted from bsyu/cuda
cuda8.0-devel: digest: sha256:5943540e7f404d9c900c8acc188f4eab85e345a282e9ad37d6e2476093afc6c5 size: 1579
root@minsky:/data/registry_volume# docker tag cuda8-cudnn5-devel:cudnn5-devel bsyu/ppc64le:cuda8-cudnn5-devel
root@minsky:/data/registry_volume# docker push bsyu/ppc64le:cuda8-cudnn5-devel
The push refers to a repository [docker.io/bsyu/ppc64le]
c0fe73e43621: Mounted from bsyu/cuda8-cudnn5-devel
4ce979019d1d: Mounted from bsyu/cuda8-cudnn5-devel
724befd94678: Layer already exists
84f99f1bf79b: Layer already exists
7f7c1dccec82: Layer already exists
5b8880a35736: Layer already exists
41b97cb9a404: Layer already exists
08f34ce6b3fb: Layer already exists
cuda8-cudnn5-devel: digest: sha256:c463051a8f78430d9de187386cca68294e2826830f085c677e5b20e70caeaf3f size: 2003
Push된 것들을 https://hub.docker.com 에서 브라우저를 통해 확인해봅니다.
반대로 pull 해보기 위해, 방금 올렸던 image들을 일괄적으로 삭제합니다. image id에 대해 rmi 명령을 날리면 같은 id의 tag들이 모두 삭제됩니다.
root@minsky:/data# docker rmi -f d8d0da2fbdf2
Untagged: bsyu/cuda8-cudnn5-devel:cudnn5-devel
Untagged: bsyu/cuda8-cudnn5-devel@sha256:c463051a8f78430d9de187386cca68294e2826830f085c677e5b20e70caeaf3f
Untagged: bsyu/ppc64le:cuda8-cudnn5-devel
Untagged: bsyu/ppc64le@sha256:c463051a8f78430d9de187386cca68294e2826830f085c677e5b20e70caeaf3f
Untagged: cuda8-cudnn5-devel:cudnn5-devel
Untagged: cuda8-cudnn5-devel:latest
Untagged: cuda:8.0-cudnn5-devel
Untagged: cuda:cudnn
Untagged: cuda:cudnn-devel
Deleted: sha256:d8d0da2fbdf24a97787e6f1b4d8531d60e665b3d0f9cac5c14d1814a91b3b946
Deleted: sha256:2320a2aed314994ad77b5cc8e8b3faf295253bed8cf8a7be8a7806be6e9c50cf
Deleted: sha256:9d10e971aaf429133422b957bd1bfb583ebd03aaea9e796c2db8b6edca0d2836
Deleted: sha256:d8877708ee88e10086ce367b63e5da965c5e21ba2c8a199ab2c7b84c2c3ff699
Deleted: sha256:37e6b06a871334c369047ac9f9ae214cd63fe29700b3ad14901702a8044548e5
Deleted: sha256:0b0445d2e213d4eeed1760f8339a4b6433b134b75ba29336ce7759e67a397f5a
Deleted: sha256:5b4eda52a5b16a564381952434146055cb690de918359587d05273c23acade22
없어진 것을 확인합니다.
root@minsky:/data# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bsyu/cuda 8.0-devel dc3faec17c11 6 hours ago 1.726 GB
bsyu/ppc64le cuda8.0-devel dc3faec17c11 6 hours ago 1.726 GB
cuda 8.0 dc3faec17c11 6 hours ago 1.726 GB
cuda 8.0-devel dc3faec17c11 6 hours ago 1.726 GB
cuda devel dc3faec17c11 6 hours ago 1.726 GB
cuda latest dc3faec17c11 6 hours ago 1.726 GB
cuda 8.0-cudnn5-runtime 8a3b0a60e741 6 hours ago 942.2 MB
cuda cudnn-runtime 8a3b0a60e741 6 hours ago 942.2 MB
cuda8-cudnn5-runtime latest 8a3b0a60e741 6 hours ago 942.2 MB
cuda8-runtime latest 8e9763b6296f 6 hours ago 844.9 MB
cuda 8.0-runtime 8e9763b6296f 6 hours ago 844.9 MB
cuda runtime 8e9763b6296f 6 hours ago 844.9 MB
ubuntu 16.04 09621ebd4cfd 4 days ago 234.3 MB
ubuntu latest 09621ebd4cfd 4 days ago 234.3 MB
ubuntu xenial 09621ebd4cfd 4 days ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d 5 days ago 430.1 MB
nvidia-docker build 8cbc22512d15 5 days ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 3 months ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
이제 pull 명령으로 그 image를 가져옵니다.
root@minsky:/data# docker pull bsyu/cuda8-cudnn5-devel:cudnn5-devel
cudnn5-devel: Pulling from bsyu/cuda8-cudnn5-devel
ffa99da61f7b: Already exists
6b239e02a89e: Already exists
aecbc9abccdc: Already exists
8f458a3f0497: Already exists
4903f7ce6675: Already exists
0c588ac98d19: Already exists
12e624e884fc: Pull complete
18dd28bbb571: Pull complete
Digest: sha256:c463051a8f78430d9de187386cca68294e2826830f085c677e5b20e70caeaf3f
Status: Downloaded newer image for bsyu/cuda8-cudnn5-devel:cudnn5-devel
제대로 가져왔는지 확인합니다.
root@minsky:/data# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
bsyu/cuda8-cudnn5-devel cudnn5-devel d8d0da2fbdf2 6 hours ago 1.895 GB
bsyu/cuda 8.0-devel dc3faec17c11 6 hours ago 1.726 GB
bsyu/ppc64le cuda8.0-devel dc3faec17c11 6 hours ago 1.726 GB
cuda 8.0 dc3faec17c11 6 hours ago 1.726 GB
cuda 8.0-devel dc3faec17c11 6 hours ago 1.726 GB
cuda devel dc3faec17c11 6 hours ago 1.726 GB
cuda latest dc3faec17c11 6 hours ago 1.726 GB
cuda8-cudnn5-runtime latest 8a3b0a60e741 6 hours ago 942.2 MB
cuda 8.0-cudnn5-runtime 8a3b0a60e741 6 hours ago 942.2 MB
cuda cudnn-runtime 8a3b0a60e741 6 hours ago 942.2 MB
cuda8-runtime latest 8e9763b6296f 6 hours ago 844.9 MB
cuda 8.0-runtime 8e9763b6296f 6 hours ago 844.9 MB
cuda runtime 8e9763b6296f 6 hours ago 844.9 MB
ubuntu 16.04 09621ebd4cfd 4 days ago 234.3 MB
ubuntu latest 09621ebd4cfd 4 days ago 234.3 MB
ubuntu xenial 09621ebd4cfd 4 days ago 234.3 MB
nvidia-docker deb 332eaa8c9f9d 5 days ago 430.1 MB
nvidia-docker build 8cbc22512d15 5 days ago 1.012 GB
ppc64le/ubuntu 14.04 c040fcd69c12 12 weeks ago 227.8 MB
ppc64le/ubuntu latest 1967d889e07f 3 months ago 168 MB
ppc64le/golang 1.6.3 6a579d02d32f 5 months ago 704.7 MB
따라하다가 nvidia-docker 소스를 deb 형식으로 빌드하는 부분 첨언 합니다.
답글삭제#git checkout ppc64le 후에, dockerfile.deb.ppc64le 이 생성되는데, 이 파일을 열어서 첫줄 "FROM ppc64le/ubuntu:16.04" 의 이미지 tag를 알맞게 바꾸어 주어야 합니다. (기본은 14.04로 되어 있음)
바꾸지 않은 채로 make deb을 수행하면 security.ubuntu.com 사이트에서 trusty-security ppc64le 버전 링크를 찾지 못해서 빌드가 실패합니다.
2017/09 기준으로 nvidia-docker deb 파일 빌드에 성공하면, nvidia-docker/dist 폴더에 .deb 파일이 생성됩니다.