이 URL(https://developer.ibm.com/linuxonpower/docker-on-power)에 나온 가이드대로, 먼저 docker를 설치합니다.
u0017649@sys-89698:~$ sudo vi /etc/apt/sources.list.d/xenial-docker.list
deb http://ftp.unicamp.br/pub/ppc64el/ubuntu/16_04/docker-17.06.0-ce-ppc64el/ xenial main
u0017649@sys-89698:~$ sudo apt-get update
u0017649@sys-89698:~$ sudo apt-get install docker-ce
그 뒤에 아래와 같이 하시면 됩니다.
u0017649@sys-89698:~$ git clone https://github.com/NVIDIA/nvidia-docker.git
u0017649@sys-89698:~$ cd nvidia-docker/
u0017649@sys-89698:~/nvidia-docker$ sudo make
...
Step 21/22 : ENV VERSION $PKG_VERS
---> Using cache
---> 39d92843179c
Step 22/22 : CMD go install -v -ldflags="-s -X main.Version=$VERSION" ./...
---> Using cache
---> a66fd805e845
Successfully built a66fd805e845
Successfully tagged nvidia-docker:build
github.com/NVIDIA/nvidia-docker/src/nvidia-docker-plugin
github.com/NVIDIA/nvidia-docker/src/nvidia-docker
u0017649@sys-89698:~/nvidia-docker$ sudo make install
...
Step 21/22 : ENV VERSION $PKG_VERS
---> Using cache
---> 39d92843179c
Step 22/22 : CMD go install -v -ldflags="-s -X main.Version=$VERSION" ./...
---> Using cache
---> a66fd805e845
Successfully built a66fd805e845
Successfully tagged nvidia-docker:build
github.com/NVIDIA/nvidia-docker/src/nvidia-docker
github.com/NVIDIA/nvidia-docker/src/nvidia-docker-plugin
install -D -m 755 -t /usr/local/bin /home/u0017649/nvidia-docker/bin/nvidia-docker
install -D -m 755 -t /usr/local/bin /home/u0017649/nvidia-docker/bin/nvidia-docker-plugin
u0017649@sys-89698:~$ which nvidia-docker
/usr/local/bin/nvidia-docker
그리고 이렇게 설치된 nvidia-docker에서는 nvidia-docker-plugin을 수동으로 살려줘야 합니다. 매번 그러기는 귀찮으니, 아예 /etc/rc.local에 아래와 같이 1줄 넣어주는 것을 권고합니다. 그러면 부팅 때마다 nvidia-docker-plugin이 백그라운드로 떠있게 됩니다.
/usr/local/bin/nvidia-docker-plugin &
그러난 뒤에, 아래와 같이 제가 미리 build 해둔 tensorflow 1.3과 python3용 Anaconda가 설치된 ppc64le용 docker image를 pull 하시면 됩니다.
root@firestone:/home# docker pull bsyu/tf1.3-ppc64le:v0.1
v0.3: Pulling from bsyu/tf1.3-ppc64le
0ad9ca03f1b2: Already exists
7d2491df9494: Already exists
9a961cbb08c0: Already exists
9f38c9e24bdd: Already exists
36070ab6f935: Already exists
dd84cc8c7847: Already exists
4317eeb0f4b7: Already exists
13f482d825bc: Already exists
d221c8d6a86c: Already exists
994d0d1ac151: Already exists
c736e0aba22f: Already exists
e7b4f7096c8f: Already exists
a0aaaa391ef4: Downloading 3.165MB/38.83MB
3c8d454a2390: Download complete
9bdf7a145ee5: Downloading 2.671MB/120.4MB
# docker pull bsyu/cudnn6-conda2-ppc64le:v0.1 (python 2.7의 anaconda image)
# docker pull bsyu/cudnn6-conda3-ppc64le:v0.1 (python 3.6의 anaconda image)
-------- 이하는 그냥 실습 옵션입니다.
이제 여기서 ppc64le/ubuntu:latest를 이용하여 libcudnn 6.0와 anaconda2/3 등이 설치된 docker image를 빌드해보겠습니다.
u0017649@sys-89830:~/docker$ ls
Anaconda2-4.4.0.1-Linux-ppc64le.sh
Anaconda3-4.4.0.1-Linux-ppc64le.sh
cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_ppc64el-deb
cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el-deb
dockerfile.cudnn6
libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
mldl-repo-local_4.0.0_ppc64el.deb
관련 file들을 위와 같이 ~/docker 디렉토리 속에 모아놓은 뒤, 아래와 같이 dockerfile을 구성합니다.
u0017649@sys-89830:~/docker$ vi dockerfile.cudnn6
FROM ppc64le/ubuntu:latest
ENV CUDNN_VERSION 6.0.21
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"
RUN mkdir /tmp/temp
COPY *deb /tmp/temp/
COPY *.sh /tmp/temp/
RUN dpkg -i /tmp/temp/cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el-deb && \
dpkg -i /tmp/temp/cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_ppc64el-deb && \
apt-get update && apt-get install -y --no-install-recommends initramfs-tools && \
apt-get install -y cuda && \
dpkg -i /tmp/temp/libcudnn* && \
dpkg -i /tmp/temp/mldl-repo-local_4.0.0_ppc64el.deb && \
/tmp/temp/Anaconda3-4.4.0.1-Linux-ppc64le.sh -b -p /opt/anaconda3 && \
/tmp/temp/Anaconda2-4.4.0.1-Linux-ppc64le.sh -b -p /opt/anaconda2 && \
apt-get install -y libnccl-dev libnccl1 python-ncclient bazel libopenblas-dev libopenblas libopenblas-base && \
apt-get remove -y mldl-repo-local && \
apt-get remove -y cuda-repo-ubuntu1604-8-0-local-ga2v2 cuda-repo-ubuntu1604-8-0-local-cublas-performance-update && \
ldconfig && \
rm -rf /tmp/temp && \
rm -rf /var/lib/apt/lists/*
ENV LD_LIBRARY_PATH="/opt/DL/nccl/lib:/opt/DL/openblas/lib:/usr/local/cuda-8.0/lib6:/usr/lib:/usr/local/lib:/opt/caffe2/lib:/usr/lib/powerpc64le-linux-gnu"
ENV PATH="/opt/anaconda2/bin:/opt/caffe2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
이제 이걸 아래와 같이 build 하면 됩니다.
u0017649@sys-89830:~/docker$ sudo docker build -t bsyu/cudnn6_v0.1:ppc64le-xenial -f dockerfile.cudnn6 .
저는 한창 진행 중에 아래와 같이 disk 공간 부족으로 error가 나네요. PDP(Power Development Platform) cloud는 disk 공간이 너무 작은 것이 탈입니다...
Preparing to unpack .../libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb ...
Unpacking libcudnn6 (6.0.21-1+cuda8.0) ...
dpkg: error processing archive /tmp/temp/libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb (--install):
cannot copy extracted data for './usr/lib/powerpc64le-linux-gnu/libcudnn.so.6.0.21' to '/usr/lib/powerpc64le-linux-gnu/libcudnn.so.6.0.21.dpkg-new': failed to write (No space left on device)
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
댓글 없음:
댓글 쓰기