2020년 2월 4일 화요일
WML-CE (Watson Machine Learning Community Edition, 구 PowerAI) 1.6.2 설치
IBM POWER 아키텍처 (POWER8/9, 즉 ppc64le)에서 tensorflow나 caffe 등 각종 deep learning framework을 제공해주던 무료 toolkit인 기존 PowerAI는 이미 다들 아시는 바와 같이 이름을 Watson Machine Learning Community Edition (WML-CE)로 변경했습니다. 물론 여전히 무료입니다만, 기존처럼 *.rpm이나 *.deb의 형태로 제공하지 않고 아예 별도의 conda channel을 만들어서 conda에서 설치하도록 하고 있습니다. 따라서, Anaconda가 prerequsite이며, 2020년 2월 초 현재 최신 버전인 1.6.2는 Ananconda 2019.07을 prerequisite으로 하고 있습니다. 아예 모든 것이 설치된 docker image 형태로도 제공됩니다.
자세한 원본 manual은 아래 link를 참조하시면 됩니다.
https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.2/navigation/wmlce_planning.html
https://www.ibm.com/support/knowledgecenter/SS5SF7_1.6.2/navigation/wmlce_install.html
여기서는 ppc64le Ubuntu 18.04, Python 3.7.5 환경에서 WML-CE 1.6.2를 설치해보겠습니다.
cecuser@p1234-kvm1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
먼저 Anaconda 2019.07 버전을 download 받아 설치합니다.
cecuser@p1234-kvm1:~$ wget https://repo.continuum.io/archive/Anaconda3-2019.07-Linux-ppc64le.sh
cecuser@p1234-kvm1:~$ chmod a+x Anaconda3-2019.07-Linux-ppc64le.sh
cecuser@p1234-kvm1:~$ ./Anaconda3-2019.07-Linux-ppc64le.sh
설치가 끝나면 ~/.bashrc를 수행하여 conda init을 수행합니다.
cecuser@p1234-kvm1:~$ . ~/.bashrc
이제 IBM이 제공하는 WML-CE를 위한 conda channel을 conda에 추가합니다.
(base) cecuser@p1234-kvm1:~$ conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
Conda 가상 환경을 생성하여 거기에 WML-CE를 설치하기를 권장하므로, 먼저 python 3.7.5 환경으로 wmlce_env라는 이름의 virtual env를 만듭니다.
(base) cecuser@p1234-kvm1:~$ conda create --name wmlce_env python=3.7.5
...
The following NEW packages will be INSTALLED:
_libgcc_mutex pkgs/main/linux-ppc64le::_libgcc_mutex-0.1-main
ca-certificates pkgs/main/linux-ppc64le::ca-certificates-2020.1.1-0
certifi pkgs/main/linux-ppc64le::certifi-2019.11.28-py37_0
libedit pkgs/main/linux-ppc64le::libedit-3.1.20181209-hc058e9b_0
libffi pkgs/main/linux-ppc64le::libffi-3.2.1-hf62a594_5
libgcc-ng pkgs/main/linux-ppc64le::libgcc-ng-8.2.0-h822a55f_1
libstdcxx-ng pkgs/main/linux-ppc64le::libstdcxx-ng-8.2.0-h822a55f_1
ncurses pkgs/main/linux-ppc64le::ncurses-6.1-he6710b0_1
openssl pkgs/main/linux-ppc64le::openssl-1.1.1d-h7b6447c_3
pip pkgs/main/linux-ppc64le::pip-20.0.2-py37_1
python pkgs/main/linux-ppc64le::python-3.7.5-h4134adf_0
readline pkgs/main/linux-ppc64le::readline-7.0-h7b6447c_5
setuptools pkgs/main/linux-ppc64le::setuptools-45.1.0-py37_0
sqlite pkgs/main/linux-ppc64le::sqlite-3.30.1-h7b6447c_0
tk pkgs/main/linux-ppc64le::tk-8.6.8-hbc83047_0
wheel pkgs/main/linux-ppc64le::wheel-0.34.1-py37_0
xz pkgs/main/linux-ppc64le::xz-5.2.4-h14c3975_4
zlib pkgs/main/linux-ppc64le::zlib-1.2.11-h7b6447c_3
Proceed ([y]/n)? y
...
Downloading and Extracting Packages
ca-certificates-2020 | 125 KB | ##################################### | 100%
setuptools-45.1.0 | 511 KB | ##################################### | 100%
pip-20.0.2 | 1.7 MB | ##################################### | 100%
sqlite-3.30.1 | 2.3 MB | ##################################### | 100%
wheel-0.34.1 | 50 KB | ##################################### | 100%
python-3.7.5 | 32.5 MB | ##################################### | 100%
openssl-1.1.1d | 3.8 MB | ##################################### | 100%
certifi-2019.11.28 | 156 KB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate wmlce_env
#
# To deactivate an active environment, use
#
# $ conda deactivate
이제 conda 가상환경인 wmlce_env를 활성화합니다.
(base) cecuser@p1234-kvm1:~$ conda activate wmlce_env
이제 WML-CE를 설치합니다. Conda package 이름은 여전히 PowerAI로 되어 있는 점에 유의하십시요. 아래와 같이 하면 tensorflow와 caffe2, pytorch 등 WML-CE에서 지원하는 모든 deep learning framework이 한꺼번에 다 설치됩니다. 혹시 WML-CE 전체를 설치하지 않고 가령 PyTorch만 설치하려 할 때는 그냥 conda install pytorch 라고 하시면 됩니다.
아래의 명령어로 어느어느 package들이 설치되는지 보여드리기 위해 긴 ouput을 일부러 다 옮겨 붙였습니다.
(wmlce_env) cecuser@p1234-kvm1:~$ conda install powerai
....
The following NEW packages will be INSTALLED:
_py-xgboost-mutex ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::_py-xgboost-mutex-1.0-gpu_590.g8a21f75
_pytorch_select ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::_pytorch_select-2.0-gpu_20238.g1faf942
_tflow_select ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::_tflow_select-2.1.0-gpu_840.g50de12c
absl-py pkgs/main/linux-ppc64le::absl-py-0.7.1-py37_0
apex ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::apex-0.1.0_1.6.2-py37_596.g1eb5c77
asn1crypto pkgs/main/linux-ppc64le::asn1crypto-1.3.0-py37_0
astor pkgs/main/linux-ppc64le::astor-0.7.1-py37_0
atomicwrites pkgs/main/linux-ppc64le::atomicwrites-1.3.0-py37_1
attrs pkgs/main/noarch::attrs-19.3.0-py_0
blas pkgs/main/linux-ppc64le::blas-1.0-openblas
bokeh pkgs/main/linux-ppc64le::bokeh-1.4.0-py37_0
boost pkgs/main/linux-ppc64le::boost-1.67.0-py37_4
bzip2 pkgs/main/linux-ppc64le::bzip2-1.0.8-h7b6447c_0
c-ares pkgs/main/linux-ppc64le::c-ares-1.15.0-h7b6447c_1001
caffe ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::caffe-1.0_1.6.2-5184.g7b10df4
caffe-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::caffe-base-1.0_1.6.2-gpu_py37_5184.g7b10df4
cairo pkgs/main/linux-ppc64le::cairo-1.14.12-h8948797_3
cffi pkgs/main/linux-ppc64le::cffi-1.12.3-py37h2e261b9_0
chardet pkgs/main/linux-ppc64le::chardet-3.0.4-py37_1003
click pkgs/main/linux-ppc64le::click-7.0-py37_0
cloudpickle pkgs/main/noarch::cloudpickle-1.2.2-py_0
coverage pkgs/main/linux-ppc64le::coverage-5.0-py37h7b6447c_0
cryptography pkgs/main/linux-ppc64le::cryptography-2.8-py37h1ba5d50_0
cudatoolkit ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::cudatoolkit-10.1.243-616.gc122b8b
cudnn ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::cudnn-7.6.3_10.1-590.g5627c5e
cycler pkgs/main/linux-ppc64le::cycler-0.10.0-py37_0
cytoolz pkgs/main/linux-ppc64le::cytoolz-0.10.1-py37h7b6447c_0
dask pkgs/main/noarch::dask-2.3.0-py_0
dask-core pkgs/main/noarch::dask-core-2.3.0-py_0
dask-cuda ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::dask-cuda-0.9.1-py37_573.g9af8baa
dask-xgboost ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::dask-xgboost-0.1.7-py37_579.g8a31cf5
ddl ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::ddl-1.5.0-py37_1287.gc90c6f2
ddl-tensorflow ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::ddl-tensorflow-1.5.0-py37_1007.g8dbb51d
decorator pkgs/main/noarch::decorator-4.4.1-py_0
distributed pkgs/main/noarch::distributed-2.3.2-py_1
ffmpeg pkgs/main/linux-ppc64le::ffmpeg-4.0-hcdf2ecd_0
fontconfig pkgs/main/linux-ppc64le::fontconfig-2.13.0-h9420a91_0
freeglut pkgs/main/linux-ppc64le::freeglut-3.0.0-hf484d3e_5
freetype pkgs/main/linux-ppc64le::freetype-2.9.1-h8a8886c_0
fsspec pkgs/main/noarch::fsspec-0.6.2-py_0
future pkgs/main/linux-ppc64le::future-0.17.1-py37_0
gast pkgs/main/linux-ppc64le::gast-0.2.2-py37_0
gflags ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::gflags-2.2.2-1624.g17209b3
glib pkgs/main/linux-ppc64le::glib-2.63.1-h5a9c865_0
glog ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::glog-0.3.5-1613.gd054598
google-pasta ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::google-pasta-0.1.6-py37_564.g04df2d9
graphite2 pkgs/main/linux-ppc64le::graphite2-1.3.13-h23475e2_0
graphsurgeon ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::graphsurgeon-0.4.1-py37_612.gb2bf6b9
grpcio pkgs/main/linux-ppc64le::grpcio-1.16.1-py37hf8bcb03_1
h5py pkgs/main/linux-ppc64le::h5py-2.8.0-py37h8d01980_0
harfbuzz pkgs/main/linux-ppc64le::harfbuzz-1.8.8-hffaf4a1_0
hdf5 pkgs/main/linux-ppc64le::hdf5-1.10.2-hba1933b_1
heapdict pkgs/main/noarch::heapdict-1.0.1-py_0
hypothesis pkgs/main/linux-ppc64le::hypothesis-3.59.1-py37h39e3cac_0
icu pkgs/main/linux-ppc64le::icu-58.2-h64fc554_1
idna pkgs/main/linux-ppc64le::idna-2.8-py37_0
imageio pkgs/main/linux-ppc64le::imageio-2.6.1-py37_0
importlib_metadata pkgs/main/linux-ppc64le::importlib_metadata-1.4.0-py37_0
jasper pkgs/main/linux-ppc64le::jasper-2.0.14-h07fcdf6_1
jinja2 pkgs/main/noarch::jinja2-2.10.3-py_0
joblib pkgs/main/linux-ppc64le::joblib-0.13.2-py37_0
jpeg pkgs/main/linux-ppc64le::jpeg-9b-hcb7ba68_2
keras-applications pkgs/main/noarch::keras-applications-1.0.8-py_0
keras-preprocessi~ pkgs/main/noarch::keras-preprocessing-1.1.0-py_1
kiwisolver pkgs/main/linux-ppc64le::kiwisolver-1.1.0-py37he6710b0_0
leveldb pkgs/main/linux-ppc64le::leveldb-1.20-hf484d3e_1
libboost pkgs/main/linux-ppc64le::libboost-1.67.0-h46d08c1_4
libgfortran-ng pkgs/main/linux-ppc64le::libgfortran-ng-7.3.0-h822a55f_1
libglu pkgs/main/linux-ppc64le::libglu-9.0.0-hf484d3e_1
libopenblas pkgs/main/linux-ppc64le::libopenblas-0.3.6-h5a2b251_1
libopencv ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libopencv-3.4.7-725.g92aa195
libopus pkgs/main/linux-ppc64le::libopus-1.3-h7b6447c_0
libpng pkgs/main/linux-ppc64le::libpng-1.6.37-hbc83047_0
libprotobuf ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libprotobuf-3.8.0-577.g45759bb
libtiff pkgs/main/linux-ppc64le::libtiff-4.1.0-h2733197_0
libuuid pkgs/main/linux-ppc64le::libuuid-1.0.3-h1bed415_2
libvpx pkgs/main/linux-ppc64le::libvpx-1.7.0-hf484d3e_0
libxcb pkgs/main/linux-ppc64le::libxcb-1.13-h1bed415_0
libxgboost-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libxgboost-base-0.90-gpu_590.g8a21f75
libxml2 pkgs/main/linux-ppc64le::libxml2-2.9.9-hea5a465_1
llvmlite pkgs/main/linux-ppc64le::llvmlite-0.29.0-py37hd408876_0
lmdb pkgs/main/linux-ppc64le::lmdb-0.9.22-hf484d3e_1
locket pkgs/main/linux-ppc64le::locket-0.2.0-py37_1
markdown pkgs/main/linux-ppc64le::markdown-3.1.1-py37_0
markupsafe pkgs/main/linux-ppc64le::markupsafe-1.1.1-py37h7b6447c_0
matplotlib pkgs/main/linux-ppc64le::matplotlib-3.1.2-py37_1
matplotlib-base pkgs/main/linux-ppc64le::matplotlib-base-3.1.2-py37h4fdacc2_1
mock pkgs/main/linux-ppc64le::mock-2.0.0-py37_0
more-itertools pkgs/main/noarch::more-itertools-8.0.2-py_0
msgpack-python pkgs/main/linux-ppc64le::msgpack-python-0.6.1-py37hfd86e86_1
nccl ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::nccl-2.4.8-586.gdba67b7
networkx pkgs/main/linux-ppc64le::networkx-2.2-py37_1
ninja pkgs/main/linux-ppc64le::ninja-1.9.0-py37hfd86e86_0
nomkl pkgs/main/linux-ppc64le::nomkl-3.0-0
numactl ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::numactl-2.0.12-573.gdf5dc62
numba pkgs/main/linux-ppc64le::numba-0.45.1-py37h962f231_0
numpy pkgs/main/linux-ppc64le::numpy-1.16.6-py37h30dfecb_0
numpy-base pkgs/main/linux-ppc64le::numpy-base-1.16.6-py37h2f8d375_0
olefile pkgs/main/linux-ppc64le::olefile-0.46-py37_0
onnx ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::onnx-1.5.0-py37_614.gd049fd7
openblas pkgs/main/linux-ppc64le::openblas-0.3.6-1
openblas-devel pkgs/main/linux-ppc64le::openblas-devel-0.3.6-1
opencv ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::opencv-3.4.7-py37_725.g92aa195
packaging pkgs/main/noarch::packaging-20.1-py_0
pai4sk ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::pai4sk-1.5.0-py37_1071.g5abf42e
pandas pkgs/main/linux-ppc64le::pandas-1.0.0-py37h0573a6f_0
partd pkgs/main/noarch::partd-1.1.0-py_0
pbr pkgs/main/noarch::pbr-5.4.4-py_0
pciutils ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::pciutils-3.6.2-571.g2316d13
pcre pkgs/main/linux-ppc64le::pcre-8.43-he6710b0_0
pillow pkgs/main/linux-ppc64le::pillow-6.2.1-py37h0d2faf8_0
pixman pkgs/main/linux-ppc64le::pixman-0.34.0-h1f8d8dc_3
pluggy pkgs/main/linux-ppc64le::pluggy-0.13.1-py37_0
powerai ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::powerai-1.6.2-615.g1dade79
powerai-license ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::powerai-license-1.6.2-716.g7081e12
powerai-release ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::powerai-release-1.6.2-572.gb216c2c
powerai-tools ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::powerai-tools-1.6.2-565.g97f2c3f
protobuf ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::protobuf-3.8.0-py37_587.gab45ad3
psutil pkgs/main/linux-ppc64le::psutil-5.5.0-py37h7b6447c_0
py pkgs/main/noarch::py-1.8.1-py_0
py-boost pkgs/main/linux-ppc64le::py-boost-1.67.0-py37h04863e7_4
py-opencv ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::py-opencv-3.4.7-py37_725.g92aa195
py-xgboost-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::py-xgboost-base-0.90-gpu_py37_590.g8a21f75
py-xgboost-gpu ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::py-xgboost-gpu-0.90-590.g8a21f75
pycparser pkgs/main/linux-ppc64le::pycparser-2.19-py37_0
pyopenssl pkgs/main/linux-ppc64le::pyopenssl-19.1.0-py37_0
pyparsing pkgs/main/noarch::pyparsing-2.4.6-py_0
pysocks pkgs/main/linux-ppc64le::pysocks-1.7.1-py37_0
pytest pkgs/main/linux-ppc64le::pytest-4.4.2-py37_0
python-dateutil pkgs/main/noarch::python-dateutil-2.8.1-py_0
python-lmdb pkgs/main/linux-ppc64le::python-lmdb-0.94-py37h14c3975_0
pytorch ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::pytorch-1.2.0-20238.g1faf942
pytorch-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::pytorch-base-1.2.0-gpu_py37_20238.g1faf942
pytz pkgs/main/noarch::pytz-2019.3-py_0
pywavelets pkgs/main/linux-ppc64le::pywavelets-1.1.1-py37h7b6447c_0
pyyaml pkgs/main/linux-ppc64le::pyyaml-5.1.2-py37h7b6447c_0
requests pkgs/main/linux-ppc64le::requests-2.22.0-py37_1
scikit-image pkgs/main/linux-ppc64le::scikit-image-0.15.0-py37he6710b0_0
scikit-learn pkgs/main/linux-ppc64le::scikit-learn-0.21.3-py37h22eb022_0
scipy pkgs/main/linux-ppc64le::scipy-1.3.1-py37he2b7bc3_0
simsearch ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::simsearch-1.1.0-py37_764.g7c5f6cf
six pkgs/main/linux-ppc64le::six-1.12.0-py37_0
snapml-spark ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::snapml-spark-1.4.0-py37_942.gc873569
snappy pkgs/main/linux-ppc64le::snappy-1.1.7-h1532aa0_3
sortedcontainers pkgs/main/linux-ppc64le::sortedcontainers-2.1.0-py37_0
spectrum-mpi ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::spectrum-mpi-10.03-622.gfc88b70
tabulate pkgs/main/linux-ppc64le::tabulate-0.8.2-py37_0
tblib pkgs/main/noarch::tblib-1.6.0-py_0
tensorboard ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorboard-1.15.0-py37_ab7f72a_3645.gf4f525e
tensorflow ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-1.15.0-gpu_py37_841.g50de12c
tensorflow-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-base-1.15.0-gpu_py37_590d6ee_64210.g4a039ec
tensorflow-estima~ ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-estimator-1.15.1-py37_a5f60ce_1351.g50de12c
tensorflow-gpu ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-gpu-1.15.0-841.g50de12c
tensorflow-large-~ ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-large-model-support-2.0.2-py37_970.gfa57a9e
tensorflow-probab~ ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-probability-0.8.0-py37_b959b26_2686.g50de12c
tensorflow-servin~ ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorflow-serving-api-1.15.0-py37_748217e_5094.g89559ef
tensorrt ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tensorrt-6.0.1.5-py37_612.gb2bf6b9
termcolor pkgs/main/linux-ppc64le::termcolor-1.1.0-py37_1
tf_cnn_benchmarks ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::tf_cnn_benchmarks-1.15-gpu_py37_1374.g5e94b18
toolz pkgs/main/noarch::toolz-0.10.0-py_0
torchtext ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::torchtext-0.4.0-py37_578.g5bf3960
torchvision-base ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::torchvision-base-0.4.0-gpu_py37_593.g80f339d
tornado pkgs/main/linux-ppc64le::tornado-6.0.3-py37h7b6447c_0
tqdm pkgs/main/noarch::tqdm-4.32.1-py_0
typing pkgs/main/linux-ppc64le::typing-3.6.4-py37_0
uff ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::uff-0.6.5-py37_612.gb2bf6b9
urllib3 pkgs/main/linux-ppc64le::urllib3-1.25.8-py37_0
werkzeug pkgs/main/noarch::werkzeug-0.15.4-py_0
wrapt pkgs/main/linux-ppc64le::wrapt-1.11.2-py37h7b6447c_0
yaml pkgs/main/linux-ppc64le::yaml-0.1.7-h1bed415_2
zict pkgs/main/noarch::zict-1.0.0-py_0
zipp pkgs/main/noarch::zipp-0.6.0-py_0
zstd pkgs/main/linux-ppc64le::zstd-1.3.7-h0b5b093_0
Proceed ([y]/n)? y
...
아래와 같이 설치된 package들을 각각 확인하시면 됩니다.
(wmlce_env) cecuser@p1234-kvm1:~$ conda list | grep tensorflow
ddl-tensorflow 1.5.0 py37_1007.g8dbb51d https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow 1.15.0 gpu_py37_841.g50de12c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-base 1.15.0 gpu_py37_590d6ee_64210.g4a039ec https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-estimator 1.15.1 py37_a5f60ce_1351.g50de12c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-gpu 1.15.0 841.g50de12c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-large-model-support 2.0.2 py37_970.gfa57a9e https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-probability 0.8.0 py37_b959b26_2686.g50de12c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
tensorflow-serving-api 1.15.0 py37_748217e_5094.g89559ef https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
(wmlce_env) cecuser@p1234-kvm1:~$ conda list | grep caffe
caffe 1.0_1.6.2 5184.g7b10df4 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
caffe-base 1.0_1.6.2 gpu_py37_5184.g7b10df4 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
(wmlce_env) cecuser@p1234-kvm1:~$ conda list | grep pytorch
_pytorch_select 2.0 gpu_20238.g1faf942 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
pytorch 1.2.0 20238.g1faf942 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
pytorch-base 1.2.0 gpu_py37_20238.g1faf942 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
WML-CE에서는 NCCL와 CUDNN 등 base facility도 함께 제공되어 설치됩니다.
(wmlce_env) cecuser@p1234-kvm1:~$ conda list | grep -i nccl
nccl 2.4.8 586.gdba67b7 https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
(wmlce_env) cecuser@p1234-kvm1:~$ conda list | grep -i dnn
cudnn 7.6.3_10.1 590.g5627c5e https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda
아래와 같이 python에서 import를 해보셔도 됩니다.
(wmlce_env) cecuser@p1234-kvm1:~$ python
Python 3.7.5 (default, Oct 25 2019, 16:29:01)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> import tensorflow as tf
2020-02-03 21:31:31.095570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
>>>
다만 "conda install powerai" 만으로는 RAPIDS까지 설치되지는 않기 때문에, 아래와 같이 별도로 설치하셔야 합니다.
(wmlce_env) cecuser@p1234-kvm1:~$ conda install powerai-rapids
...
The following NEW packages will be INSTALLED:
arrow-cpp ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::arrow-cpp-0.15.1-py37_603.g702c836
boost-cpp pkgs/main/linux-ppc64le::boost-cpp-1.67.0-h14c3975_4
brotli pkgs/main/linux-ppc64le::brotli-1.0.6-he6710b0_0
cudf ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::cudf-0.9.0-cuda10.1_py37_626.gddcad2d
cuml ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::cuml-0.9.1-cuda10.1_py37_605.gfe9e07b
cupy ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::cupy-6.2.0-py37_567.g0f1e2ef
cython pkgs/main/linux-ppc64le::cython-0.29.14-py37he6710b0_0
dask-cudf ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::dask-cudf-0.9.0-py37_575.g0416adf
dlpack ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::dlpack-0.2-562.g28dffd9
double-conversion ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::double-conversion-3.1.5-564.g4b43169
fastavro ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::fastavro-0.22.4-py37_562.g9525976
fastrlock pkgs/main/linux-ppc64le::fastrlock-0.4-py37he6710b0_0
grpc-cpp ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::grpc-cpp-1.23.0-568.g4f71a06
libcudf ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libcudf-0.9.0-cuda10.1_609.g113236a
libcuml ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libcuml-0.9.1-cuda10.1_576.ga304a0a
libevent ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libevent-2.1.8-561.ge1d98f7
libnvstrings ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::libnvstrings-0.9.0-cuda10.1_570.ga04797c
librmm ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::librmm-0.9.0-cuda10.1_567.gff1b1a1
lz4-c pkgs/main/linux-ppc64le::lz4-c-1.8.1.2-h14c3975_0
nvstrings ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::nvstrings-0.9.0-cuda10.1_py37_580.gdbb6546
parquet-cpp ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::parquet-cpp-1.5.1-579.g6eecc60
powerai-rapids ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::powerai-rapids-1.6.2-560.ga7c5a47
pyarrow ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::pyarrow-0.15.1-py37_609.g3a6717a
re2 ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::re2-2019.08.01-561.gef92448
rmm ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::rmm-0.9.0-cuda10.1_py37_569.g04c75fb
thrift-cpp ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::thrift-cpp-0.12.0-580.gf96fa62
uriparser ibmdl/export/pub/software/server/ibm-ai/conda/linux-ppc64le::uriparser-0.9.3-561.g7465fef
The following packages will be DOWNGRADED:
pandas 1.0.0-py37h0573a6f_0 --> 0.24.2-py37he6710b0_0
Proceed ([y]/n)? y
...
만약 GPU가 없는 시스템에서 tensorflow나 pytorch 등을 사용하시고자 할 때는, 아래와 같이 CPU-only 버전의 WML-CE를 설치하시면 됩니다.
(wmlce_env) cecuser@p1234-kvm1:~$ conda install powerai-cpu
2017년 11월 10일 금요일
tensorflow 1.3, caffe2, pytorch의 nvidia-docker를 이용한 테스트
tensorflow 1.3, caffe2, pytorch의 nvidia-docker를 이용한 테스트 방법입니다.
1) tensorflow v1.3
다음과 같이 tensorflow 1.3 docker image를 구동합니다.
root@minsky:~# nvidia-docker run -ti --rm -v /data:/data bsyu/tf1.3-ppc64le:v0.1 bash
먼저 각종 PATH 환경 변수를 확인합니다.
root@67c0e6901bb2:/# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages
cifar10 관련된 example code가 들어있는 directory로 이동합니다.
root@67c0e6901bb2:/# cd /data/imsi/tensorflow/models/tutorials/image/cifar10
수행할 cifar10_multi_gpu_train.py code를 일부 수정합니다. (원래는 --train_dir 등의 명령어 파라미터로 조정이 가능해야 하는데, 실제로는 직접 source를 수정해야 제대로 수행되는 것 같습니다.)
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512 --num_gpus 2
usage: cifar10_multi_gpu_train.py [-h] [--batch_size BATCH_SIZE]
[--data_dir DATA_DIR] [--use_fp16 USE_FP16]
cifar10_multi_gpu_train.py: error: unrecognized arguments: --num_gpus 2
위와 같은 error를 막기 위해, 아래와 같이 직접 code를 수정합니다.
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# vi cifar10_multi_gpu_train.py
...
#parser.add_argument('--train_dir', type=str, default='/tmp/cifar10_train',
parser.add_argument('--train_dir', type=str, default='/data/imsi/test/tf1.3',
help='Directory where to write event logs and checkpoint.')
#parser.add_argument('--max_steps', type=int, default=1000000,
parser.add_argument('--max_steps', type=int, default=10000,
help='Number of batches to run.')
#parser.add_argument('--num_gpus', type=int, default=1,
parser.add_argument('--num_gpus', type=int, default=4,
help='How many GPUs to use.')
이제 다음과 같이 run 하시면 됩니다. 여기서는 batch_size를 512로 했는데, 더 크게 잡아도 될 것 같습니다.
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512
>> Downloading cifar-10-binary.tar.gz 6.1%
...
2017-11-10 01:20:23.628755: step 9440, loss = 0.63 (15074.6 examples/sec; 0.034 sec/batch)
2017-11-10 01:20:25.052011: step 9450, loss = 0.64 (14615.4 examples/sec; 0.035 sec/batch)
2017-11-10 01:20:26.489564: step 9460, loss = 0.55 (14872.0 examples/sec; 0.034 sec/batch)
2017-11-10 01:20:27.860303: step 9470, loss = 0.61 (14515.9 examples/sec; 0.035 sec/batch)
2017-11-10 01:20:29.289386: step 9480, loss = 0.54 (13690.6 examples/sec; 0.037 sec/batch)
2017-11-10 01:20:30.799570: step 9490, loss = 0.69 (15940.8 examples/sec; 0.032 sec/batch)
2017-11-10 01:20:32.239056: step 9500, loss = 0.54 (12581.4 examples/sec; 0.041 sec/batch)
2017-11-10 01:20:34.219832: step 9510, loss = 0.60 (14077.9 examples/sec; 0.036 sec/batch)
...
다음으로는 전체 CPU, 즉 2개 chip 총 16-core의 절반인 1개 chip 8-core와, 전체 GPU 4개 중 2개의 GPU만 할당한 docker를 수행합니다. 여기서 --cpuset-cpus을 써서 CPU 자원을 control할 때, 저렇게 CPU 번호를 2개씩 그룹으로 줍니다. 이는 IBM POWER8가 SMT(hyperthread)가 core당 8개씩 낼 수 있는 특성 때문에 core 1개당 8개의 logical CPU 번호를 할당하기 때문입니다. 현재는 deep learning의 성능 최적화를 위해 SMT를 8이 아닌 2로 맞추어 놓았습니다.
root@minsky:~# NV_GPU=0,1 nvidia-docker run -ti --rm --cpuset-cpus="0,1,8,9,16,17,24,25,32,33,40,41,48,49" -v /data:/data bsyu/tf1.3-ppc64le:v0.1 bash
root@3b2c2614811d:~# nvidia-smi
Fri Nov 10 02:24:14 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.119 Driver Version: 361.119 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 |
| N/A 38C P0 30W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 |
| N/A 40C P0 33W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@3b2c2614811d:/# cd /data/imsi/tensorflow/models/tutorials/image/cifar10
이제 GPU가 4개가 아니라 2개이므로, cifar10_multi_gpu_train.py도 아래와 같이 수정합니다.
root@3b2c2614811d:/data/imsi/tensorflow/models/tutorials/image/cifar10# vi cifar10_multi_gpu_train.py
...
#parser.add_argument('--num_gpus', type=int, default=1,
parser.add_argument('--num_gpus', type=int, default=2,
help='How many GPUs to use.')
수행하면 잘 돌아갑니다.
root@3b2c2614811d:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512
>> Downloading cifar-10-binary.tar.gz 1.7%
...
2017-11-10 02:35:50.040462: step 120, loss = 4.07 (15941.4 examples/sec; 0.032 sec/batch)
2017-11-10 02:35:50.587970: step 130, loss = 4.14 (19490.7 examples/sec; 0.026 sec/batch)
2017-11-10 02:35:51.119347: step 140, loss = 3.91 (18319.8 examples/sec; 0.028 sec/batch)
2017-11-10 02:35:51.655916: step 150, loss = 3.87 (20087.1 examples/sec; 0.025 sec/batch)
2017-11-10 02:35:52.181703: step 160, loss = 3.90 (19215.5 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:52.721608: step 170, loss = 3.82 (17780.1 examples/sec; 0.029 sec/batch)
2017-11-10 02:35:53.245088: step 180, loss = 3.92 (18888.4 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:53.777146: step 190, loss = 3.80 (19103.7 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:54.308063: step 200, loss = 3.76 (18554.2 examples/sec; 0.028 sec/batch)
...
2) caffe2
여기서는 처음부터 GPU 2개와 CPU core 8개만 가지고 docker를 띄워 보겠습니다.
root@minsky:~# NV_GPU=0,1 nvidia-docker run -ti --rm --cpuset-cpus="0,1,8,9,16,17,24,25,32,33,40,41,48,49" -v /data:/data bsyu/caffe2-ppc64le:v0.3 bash
보시는 바와 같이 GPU가 2개만 올라옵니다.
root@dc853a5495a0:/# nvidia-smi
Fri Nov 10 07:22:21 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.119 Driver Version: 361.119 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 |
| N/A 32C P0 29W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 |
| N/A 35C P0 32W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
환경변수를 확인합니다. 여기서는 caffe2가 /opt/caffe2에 설치되어 있으므로, LD_LIBRARY_PATH나 PYTHONPATH를 거기에 맞춥니다.
root@dc853a5495a0:/# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/opt/caffe2/lib:/opt/DL/nccl/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/caffe2/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/opt/caffe2
caffe2는 아래의 resnet50_trainer.py를 이용해 테스트합니다. 그 전에, 먼저 https://github.com/caffe2/caffe2/issues/517 에 나온 lmdb 생성 문제를 해결하기 위해 이 URL에서 제시하는 대로 아래와 같이 code 일부를 수정합니다.
root@dc853a5495a0:/# cd /data/imsi/caffe2/caffe2/python/examples
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# vi lmdb_create_example.py
...
flatten_img = img_data.reshape(np.prod(img_data.shape))
# img_tensor.float_data.extend(flatten_img)
img_tensor.float_data.extend(flatten_img.flat)
이어서 다음과 같이 lmdb를 생성합니다. 이미 1번 수행했으므로 다시 할 경우 매우 빨리 수행될 것입니다.
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# python lmdb_create_example.py --output_file /data/imsi/test/caffe2/lmdb
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1744827
>>> Read database...
Checksum/read: 1744827
그 다음에 training을 다음과 같이 수행합니다. 여기서는 GPU가 2개만 보이는 환경이므로, --gpus에 0,1,2,3 대신 0,1만 써야 합니다.
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# time python resnet50_trainer.py --train_data /data/imsi/test/caffe2/lmdb --gpus 0,1 --batch_size 128 --num_epochs 1
수행하면 다음과 같이 'not a valid file'이라는 경고 메시지가 나옵니다만, github 등을 googling해보면 무시하셔도 되는 메시지입니다.
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.
INFO:resnet50_trainer:Running on GPUs: [0, 1]
INFO:resnet50_trainer:Using epoch size: 1499904
INFO:data_parallel_model:Parallelizing model for devices: [0, 1]
INFO:data_parallel_model:Create input and model training operators
INFO:data_parallel_model:Model for GPU : 0
INFO:data_parallel_model:Model for GPU : 1
INFO:data_parallel_model:Adding gradient operators
INFO:data_parallel_model:Add gradient all-reduces for SyncSGD
INFO:data_parallel_model:Post-iteration operators for updating params
INFO:data_parallel_model:Calling optimizer builder function
INFO:data_parallel_model:Add initial parameter sync
WARNING:data_parallel_model:------- DEPRECATED API, please use data_parallel_model.OptimizeGradientMemory() -----
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Memonger memory optimization took 0.252535104752 secs
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Memonger memory optimization took 0.253523111343 secs
INFO:resnet50_trainer:Starting epoch 0/1
INFO:resnet50_trainer:Finished iteration 1/11718 of epoch 0 (27.70 images/sec)
INFO:resnet50_trainer:Training loss: 7.39205980301, accuracy: 0.0
INFO:resnet50_trainer:Finished iteration 2/11718 of epoch 0 (378.51 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 3/11718 of epoch 0 (387.87 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 4/11718 of epoch 0 (383.28 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 5/11718 of epoch 0 (381.71 images/sec)
...
다만 위와 같이 처음부터 accuracy가 1.0으로 나오는 문제가 있습니다. 이 resnet50_trainer.py 문제에 대해서는 caffe2의 github에 아래와 같이 discussion들이 있었습니다만, 아직 뾰족한 해결책은 없는 상태입니다. 하지만 상대적 시스템 성능 측정에는 별 문제가 없습니다.
https://github.com/caffe2/caffe2/issues/810
3) pytorch
이번에는 pytorch 이미지로 테스트하겠습니다.
root@8ccd72116fee:~# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
먼저 docker image를 아래와 같이 구동합니다. 단, 여기서는 --ipc=host 옵션을 씁니다. 이유는 https://discuss.pytorch.org/t/imagenet-example-is-crashing/1363/2 에서 언급된 hang 현상을 피하기 위한 것입니다.
root@minsky:~# nvidia-docker run -ti --rm --ipc=host -v /data:/data bsyu/pytorch-ppc64le:v0.1 bash
가장 간단한 example인 mnist를 아래와 같이 수행합니다. 10 epochs를 수행하는데 대략 1분 30초 정도가 걸립니다.
root@8ccd72116fee:/data/imsi/examples/mnist# time python main.py --batch-size 512 --epochs 10
...
rain Epoch: 9 [25600/60000 (42%)] Loss: 0.434816
Train Epoch: 9 [30720/60000 (51%)] Loss: 0.417652
Train Epoch: 9 [35840/60000 (59%)] Loss: 0.503125
Train Epoch: 9 [40960/60000 (68%)] Loss: 0.477776
Train Epoch: 9 [46080/60000 (76%)] Loss: 0.346416
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.361492
Train Epoch: 9 [56320/60000 (93%)] Loss: 0.383941
Test set: Average loss: 0.1722, Accuracy: 9470/10000 (95%)
Train Epoch: 10 [0/60000 (0%)] Loss: 0.369119
Train Epoch: 10 [5120/60000 (8%)] Loss: 0.377726
Train Epoch: 10 [10240/60000 (17%)] Loss: 0.402854
Train Epoch: 10 [15360/60000 (25%)] Loss: 0.349409
Train Epoch: 10 [20480/60000 (34%)] Loss: 0.295271
...
다만 이건 single-GPU만 사용하는 example입니다. Multi-GPU를 사용하기 위해서는 아래의 imagenet example을 수행해야 하는데, 그러자면 ilsvrc2012 dataset을 download 받아 풀어놓아야 합니다. 그 data는 아래와 같이 /data/imagenet_dir/train과 /data/imagenet_dir/val에 각각 JPEG 형태로 풀어놓았습니다.
root@minsky:/data/imagenet_dir/train# while read SYNSET; do
> mkdir -p ${SYNSET}
> tar xf ../../ILSVRC2012_img_train.tar "${SYNSET}.tar"
> tar xf "${SYNSET}.tar" -C "${SYNSET}"
> rm -f "${SYNSET}.tar"
> done < /opt/DL/caffe-nv/data/ilsvrc12/synsets.txt
root@minsky:/data/imagenet_dir/train# ls -1 | wc -l
1000
root@minsky:/data/imagenet_dir/train# du -sm .
142657 .
root@minsky:/data/imagenet_dir/train# find . | wc -l
1282168
root@minsky:/data/imagenet_dir/val# ls -1 | wc -l
50000
이 상태에서 그대로 main.py를 수행하면 다음과 같은 error를 겪게 됩니다. 이유는 이 main.py는 val 디렉토리 밑에도 label별 디렉토리에 JPEG 파일들이 들어가 있기를 기대하는 구조이기 때문입니다.
RuntimeError: Found 0 images in subfolders of: /data/imagenet_dir/val
Supported image extensions are: .jpg,.JPG,.jpeg,.JPEG,.png,.PNG,.ppm,.PPM,.bmp,.BMP
따라서 아래와 같이 inception 디렉토리의 preprocess_imagenet_validation_data.py를 이용하여 label별 디렉토리로 JPEG 파일들을 분산 재배치해야 합니다.
root@minsky:/data/models/research/inception/inception/data# python preprocess_imagenet_validation_data.py /data/imagenet_dir/val imagenet_2012_validation_synset_labels.txt
이제 다시 보면 label별로 재분배된 것을 보실 수 있습니다.
root@minsky:/data/imagenet_dir/val# ls | head -n 3
n01440764
n01443537
n01484850
root@minsky:/data/imagenet_dir/val# ls | wc -l
1000
root@minsky:/data/imagenet_dir/val# find . | wc -l
51001
이제 다음과 같이 main.py를 수행하면 됩니다.
root@8ccd72116fee:~# cd /data/imsi/examples/imagenet
root@8ccd72116fee:/data/imsi/examples/imagenet# time python main.py -a resnet18 --epochs 1 /data/imagenet_dir
=> creating model 'resnet18'
Epoch: [0][0/5005] Time 11.237 (11.237) Data 2.330 (2.330) Loss 7.0071 (7.0071) Prec@1 0.391 (0.391) Prec@5 0.391 (0.391)
Epoch: [0][10/5005] Time 0.139 (1.239) Data 0.069 (0.340) Loss 7.1214 (7.0515) Prec@1 0.000 (0.284) Prec@5 0.000 (1.065)
Epoch: [0][20/5005] Time 0.119 (0.854) Data 0.056 (0.342) Loss 7.1925 (7.0798) Prec@1 0.000 (0.260) Prec@5 0.781 (0.930)
...
* 위에서 사용된 docker image들은 다음과 같이 backup을 받아두었습니다.
root@minsky:/data/docker_save# docker save --output caffe2-ppc64le.v0.3.tar bsyu/caffe2-ppc64le:v0.3
root@minsky:/data/docker_save# docker save --output pytorch-ppc64le.v0.1.tar bsyu/pytorch-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output tf1.3-ppc64le.v0.1.tar bsyu/tf1.3-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output cudnn6-conda2-ppc64le.v0.1.tar bsyu/cudnn6-conda2-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output cudnn6-conda3-ppc64le.v0.1.tar bsyu/cudnn6-conda3-ppc64le:v0.1
root@minsky:/data/docker_save# ls -l
total 28023280
-rw------- 1 root root 4713168896 Nov 10 16:48 caffe2-ppc64le.v0.3.tar
-rw------- 1 root root 4218520064 Nov 10 17:10 cudnn6-conda2-ppc64le.v0.1.tar
-rw------- 1 root root 5272141312 Nov 10 17:11 cudnn6-conda3-ppc64le.v0.1.tar
-rw------- 1 root root 6921727488 Nov 10 16:51 pytorch-ppc64le.v0.1.tar
-rw------- 1 root root 7570257920 Nov 10 16:55 tf1.3-ppc64le.v0.1.tar
비상시엔 이 이미지들을 docker load 명령으로 load 하시면 됩니다.
(예) docker load --input caffe2-ppc64le.v0.3.tar
1) tensorflow v1.3
다음과 같이 tensorflow 1.3 docker image를 구동합니다.
root@minsky:~# nvidia-docker run -ti --rm -v /data:/data bsyu/tf1.3-ppc64le:v0.1 bash
먼저 각종 PATH 환경 변수를 확인합니다.
root@67c0e6901bb2:/# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages
cifar10 관련된 example code가 들어있는 directory로 이동합니다.
root@67c0e6901bb2:/# cd /data/imsi/tensorflow/models/tutorials/image/cifar10
수행할 cifar10_multi_gpu_train.py code를 일부 수정합니다. (원래는 --train_dir 등의 명령어 파라미터로 조정이 가능해야 하는데, 실제로는 직접 source를 수정해야 제대로 수행되는 것 같습니다.)
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512 --num_gpus 2
usage: cifar10_multi_gpu_train.py [-h] [--batch_size BATCH_SIZE]
[--data_dir DATA_DIR] [--use_fp16 USE_FP16]
cifar10_multi_gpu_train.py: error: unrecognized arguments: --num_gpus 2
위와 같은 error를 막기 위해, 아래와 같이 직접 code를 수정합니다.
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# vi cifar10_multi_gpu_train.py
...
#parser.add_argument('--train_dir', type=str, default='/tmp/cifar10_train',
parser.add_argument('--train_dir', type=str, default='/data/imsi/test/tf1.3',
help='Directory where to write event logs and checkpoint.')
#parser.add_argument('--max_steps', type=int, default=1000000,
parser.add_argument('--max_steps', type=int, default=10000,
help='Number of batches to run.')
#parser.add_argument('--num_gpus', type=int, default=1,
parser.add_argument('--num_gpus', type=int, default=4,
help='How many GPUs to use.')
이제 다음과 같이 run 하시면 됩니다. 여기서는 batch_size를 512로 했는데, 더 크게 잡아도 될 것 같습니다.
root@67c0e6901bb2:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512
>> Downloading cifar-10-binary.tar.gz 6.1%
...
2017-11-10 01:20:23.628755: step 9440, loss = 0.63 (15074.6 examples/sec; 0.034 sec/batch)
2017-11-10 01:20:25.052011: step 9450, loss = 0.64 (14615.4 examples/sec; 0.035 sec/batch)
2017-11-10 01:20:26.489564: step 9460, loss = 0.55 (14872.0 examples/sec; 0.034 sec/batch)
2017-11-10 01:20:27.860303: step 9470, loss = 0.61 (14515.9 examples/sec; 0.035 sec/batch)
2017-11-10 01:20:29.289386: step 9480, loss = 0.54 (13690.6 examples/sec; 0.037 sec/batch)
2017-11-10 01:20:30.799570: step 9490, loss = 0.69 (15940.8 examples/sec; 0.032 sec/batch)
2017-11-10 01:20:32.239056: step 9500, loss = 0.54 (12581.4 examples/sec; 0.041 sec/batch)
2017-11-10 01:20:34.219832: step 9510, loss = 0.60 (14077.9 examples/sec; 0.036 sec/batch)
...
다음으로는 전체 CPU, 즉 2개 chip 총 16-core의 절반인 1개 chip 8-core와, 전체 GPU 4개 중 2개의 GPU만 할당한 docker를 수행합니다. 여기서 --cpuset-cpus을 써서 CPU 자원을 control할 때, 저렇게 CPU 번호를 2개씩 그룹으로 줍니다. 이는 IBM POWER8가 SMT(hyperthread)가 core당 8개씩 낼 수 있는 특성 때문에 core 1개당 8개의 logical CPU 번호를 할당하기 때문입니다. 현재는 deep learning의 성능 최적화를 위해 SMT를 8이 아닌 2로 맞추어 놓았습니다.
root@minsky:~# NV_GPU=0,1 nvidia-docker run -ti --rm --cpuset-cpus="0,1,8,9,16,17,24,25,32,33,40,41,48,49" -v /data:/data bsyu/tf1.3-ppc64le:v0.1 bash
root@3b2c2614811d:~# nvidia-smi
Fri Nov 10 02:24:14 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.119 Driver Version: 361.119 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 |
| N/A 38C P0 30W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 |
| N/A 40C P0 33W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@3b2c2614811d:/# cd /data/imsi/tensorflow/models/tutorials/image/cifar10
이제 GPU가 4개가 아니라 2개이므로, cifar10_multi_gpu_train.py도 아래와 같이 수정합니다.
root@3b2c2614811d:/data/imsi/tensorflow/models/tutorials/image/cifar10# vi cifar10_multi_gpu_train.py
...
#parser.add_argument('--num_gpus', type=int, default=1,
parser.add_argument('--num_gpus', type=int, default=2,
help='How many GPUs to use.')
수행하면 잘 돌아갑니다.
root@3b2c2614811d:/data/imsi/tensorflow/models/tutorials/image/cifar10# time python cifar10_multi_gpu_train.py --batch_size 512
>> Downloading cifar-10-binary.tar.gz 1.7%
...
2017-11-10 02:35:50.040462: step 120, loss = 4.07 (15941.4 examples/sec; 0.032 sec/batch)
2017-11-10 02:35:50.587970: step 130, loss = 4.14 (19490.7 examples/sec; 0.026 sec/batch)
2017-11-10 02:35:51.119347: step 140, loss = 3.91 (18319.8 examples/sec; 0.028 sec/batch)
2017-11-10 02:35:51.655916: step 150, loss = 3.87 (20087.1 examples/sec; 0.025 sec/batch)
2017-11-10 02:35:52.181703: step 160, loss = 3.90 (19215.5 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:52.721608: step 170, loss = 3.82 (17780.1 examples/sec; 0.029 sec/batch)
2017-11-10 02:35:53.245088: step 180, loss = 3.92 (18888.4 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:53.777146: step 190, loss = 3.80 (19103.7 examples/sec; 0.027 sec/batch)
2017-11-10 02:35:54.308063: step 200, loss = 3.76 (18554.2 examples/sec; 0.028 sec/batch)
...
2) caffe2
여기서는 처음부터 GPU 2개와 CPU core 8개만 가지고 docker를 띄워 보겠습니다.
root@minsky:~# NV_GPU=0,1 nvidia-docker run -ti --rm --cpuset-cpus="0,1,8,9,16,17,24,25,32,33,40,41,48,49" -v /data:/data bsyu/caffe2-ppc64le:v0.3 bash
보시는 바와 같이 GPU가 2개만 올라옵니다.
root@dc853a5495a0:/# nvidia-smi
Fri Nov 10 07:22:21 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.119 Driver Version: 361.119 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 |
| N/A 32C P0 29W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 |
| N/A 35C P0 32W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
환경변수를 확인합니다. 여기서는 caffe2가 /opt/caffe2에 설치되어 있으므로, LD_LIBRARY_PATH나 PYTHONPATH를 거기에 맞춥니다.
root@dc853a5495a0:/# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/opt/caffe2/lib:/opt/DL/nccl/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/caffe2/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PYTHONPATH=/opt/caffe2
caffe2는 아래의 resnet50_trainer.py를 이용해 테스트합니다. 그 전에, 먼저 https://github.com/caffe2/caffe2/issues/517 에 나온 lmdb 생성 문제를 해결하기 위해 이 URL에서 제시하는 대로 아래와 같이 code 일부를 수정합니다.
root@dc853a5495a0:/# cd /data/imsi/caffe2/caffe2/python/examples
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# vi lmdb_create_example.py
...
flatten_img = img_data.reshape(np.prod(img_data.shape))
# img_tensor.float_data.extend(flatten_img)
img_tensor.float_data.extend(flatten_img.flat)
이어서 다음과 같이 lmdb를 생성합니다. 이미 1번 수행했으므로 다시 할 경우 매우 빨리 수행될 것입니다.
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# python lmdb_create_example.py --output_file /data/imsi/test/caffe2/lmdb
>>> Write database...
Inserted 0 rows
Inserted 16 rows
Inserted 32 rows
Inserted 48 rows
Inserted 64 rows
Inserted 80 rows
Inserted 96 rows
Inserted 112 rows
Checksum/write: 1744827
>>> Read database...
Checksum/read: 1744827
그 다음에 training을 다음과 같이 수행합니다. 여기서는 GPU가 2개만 보이는 환경이므로, --gpus에 0,1,2,3 대신 0,1만 써야 합니다.
root@dc853a5495a0:/data/imsi/caffe2/caffe2/python/examples# time python resnet50_trainer.py --train_data /data/imsi/test/caffe2/lmdb --gpus 0,1 --batch_size 128 --num_epochs 1
수행하면 다음과 같이 'not a valid file'이라는 경고 메시지가 나옵니다만, github 등을 googling해보면 무시하셔도 되는 메시지입니다.
Ignoring @/caffe2/caffe2/contrib/nccl:nccl_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/contrib/gloo:gloo_ops_gpu as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:file_store_handler_ops as it is not a valid file.
Ignoring @/caffe2/caffe2/distributed:redis_store_handler_ops as it is not a valid file.
INFO:resnet50_trainer:Running on GPUs: [0, 1]
INFO:resnet50_trainer:Using epoch size: 1499904
INFO:data_parallel_model:Parallelizing model for devices: [0, 1]
INFO:data_parallel_model:Create input and model training operators
INFO:data_parallel_model:Model for GPU : 0
INFO:data_parallel_model:Model for GPU : 1
INFO:data_parallel_model:Adding gradient operators
INFO:data_parallel_model:Add gradient all-reduces for SyncSGD
INFO:data_parallel_model:Post-iteration operators for updating params
INFO:data_parallel_model:Calling optimizer builder function
INFO:data_parallel_model:Add initial parameter sync
WARNING:data_parallel_model:------- DEPRECATED API, please use data_parallel_model.OptimizeGradientMemory() -----
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Memonger memory optimization took 0.252535104752 secs
WARNING:memonger:NOTE: Executing memonger to optimize gradient memory
INFO:memonger:Memonger memory optimization took 0.253523111343 secs
INFO:resnet50_trainer:Starting epoch 0/1
INFO:resnet50_trainer:Finished iteration 1/11718 of epoch 0 (27.70 images/sec)
INFO:resnet50_trainer:Training loss: 7.39205980301, accuracy: 0.0
INFO:resnet50_trainer:Finished iteration 2/11718 of epoch 0 (378.51 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 3/11718 of epoch 0 (387.87 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 4/11718 of epoch 0 (383.28 images/sec)
INFO:resnet50_trainer:Training loss: 0.0, accuracy: 1.0
INFO:resnet50_trainer:Finished iteration 5/11718 of epoch 0 (381.71 images/sec)
...
다만 위와 같이 처음부터 accuracy가 1.0으로 나오는 문제가 있습니다. 이 resnet50_trainer.py 문제에 대해서는 caffe2의 github에 아래와 같이 discussion들이 있었습니다만, 아직 뾰족한 해결책은 없는 상태입니다. 하지만 상대적 시스템 성능 측정에는 별 문제가 없습니다.
https://github.com/caffe2/caffe2/issues/810
3) pytorch
이번에는 pytorch 이미지로 테스트하겠습니다.
root@8ccd72116fee:~# env | grep PATH
LIBRARY_PATH=/usr/local/cuda/lib64/stubs:
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/anaconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
먼저 docker image를 아래와 같이 구동합니다. 단, 여기서는 --ipc=host 옵션을 씁니다. 이유는 https://discuss.pytorch.org/t/imagenet-example-is-crashing/1363/2 에서 언급된 hang 현상을 피하기 위한 것입니다.
root@minsky:~# nvidia-docker run -ti --rm --ipc=host -v /data:/data bsyu/pytorch-ppc64le:v0.1 bash
가장 간단한 example인 mnist를 아래와 같이 수행합니다. 10 epochs를 수행하는데 대략 1분 30초 정도가 걸립니다.
root@8ccd72116fee:/data/imsi/examples/mnist# time python main.py --batch-size 512 --epochs 10
...
rain Epoch: 9 [25600/60000 (42%)] Loss: 0.434816
Train Epoch: 9 [30720/60000 (51%)] Loss: 0.417652
Train Epoch: 9 [35840/60000 (59%)] Loss: 0.503125
Train Epoch: 9 [40960/60000 (68%)] Loss: 0.477776
Train Epoch: 9 [46080/60000 (76%)] Loss: 0.346416
Train Epoch: 9 [51200/60000 (85%)] Loss: 0.361492
Train Epoch: 9 [56320/60000 (93%)] Loss: 0.383941
Test set: Average loss: 0.1722, Accuracy: 9470/10000 (95%)
Train Epoch: 10 [0/60000 (0%)] Loss: 0.369119
Train Epoch: 10 [5120/60000 (8%)] Loss: 0.377726
Train Epoch: 10 [10240/60000 (17%)] Loss: 0.402854
Train Epoch: 10 [15360/60000 (25%)] Loss: 0.349409
Train Epoch: 10 [20480/60000 (34%)] Loss: 0.295271
...
다만 이건 single-GPU만 사용하는 example입니다. Multi-GPU를 사용하기 위해서는 아래의 imagenet example을 수행해야 하는데, 그러자면 ilsvrc2012 dataset을 download 받아 풀어놓아야 합니다. 그 data는 아래와 같이 /data/imagenet_dir/train과 /data/imagenet_dir/val에 각각 JPEG 형태로 풀어놓았습니다.
root@minsky:/data/imagenet_dir/train# while read SYNSET; do
> mkdir -p ${SYNSET}
> tar xf ../../ILSVRC2012_img_train.tar "${SYNSET}.tar"
> tar xf "${SYNSET}.tar" -C "${SYNSET}"
> rm -f "${SYNSET}.tar"
> done < /opt/DL/caffe-nv/data/ilsvrc12/synsets.txt
root@minsky:/data/imagenet_dir/train# ls -1 | wc -l
1000
root@minsky:/data/imagenet_dir/train# du -sm .
142657 .
root@minsky:/data/imagenet_dir/train# find . | wc -l
1282168
root@minsky:/data/imagenet_dir/val# ls -1 | wc -l
50000
이 상태에서 그대로 main.py를 수행하면 다음과 같은 error를 겪게 됩니다. 이유는 이 main.py는 val 디렉토리 밑에도 label별 디렉토리에 JPEG 파일들이 들어가 있기를 기대하는 구조이기 때문입니다.
RuntimeError: Found 0 images in subfolders of: /data/imagenet_dir/val
Supported image extensions are: .jpg,.JPG,.jpeg,.JPEG,.png,.PNG,.ppm,.PPM,.bmp,.BMP
따라서 아래와 같이 inception 디렉토리의 preprocess_imagenet_validation_data.py를 이용하여 label별 디렉토리로 JPEG 파일들을 분산 재배치해야 합니다.
root@minsky:/data/models/research/inception/inception/data# python preprocess_imagenet_validation_data.py /data/imagenet_dir/val imagenet_2012_validation_synset_labels.txt
이제 다시 보면 label별로 재분배된 것을 보실 수 있습니다.
root@minsky:/data/imagenet_dir/val# ls | head -n 3
n01440764
n01443537
n01484850
root@minsky:/data/imagenet_dir/val# ls | wc -l
1000
root@minsky:/data/imagenet_dir/val# find . | wc -l
51001
이제 다음과 같이 main.py를 수행하면 됩니다.
root@8ccd72116fee:~# cd /data/imsi/examples/imagenet
root@8ccd72116fee:/data/imsi/examples/imagenet# time python main.py -a resnet18 --epochs 1 /data/imagenet_dir
=> creating model 'resnet18'
Epoch: [0][0/5005] Time 11.237 (11.237) Data 2.330 (2.330) Loss 7.0071 (7.0071) Prec@1 0.391 (0.391) Prec@5 0.391 (0.391)
Epoch: [0][10/5005] Time 0.139 (1.239) Data 0.069 (0.340) Loss 7.1214 (7.0515) Prec@1 0.000 (0.284) Prec@5 0.000 (1.065)
Epoch: [0][20/5005] Time 0.119 (0.854) Data 0.056 (0.342) Loss 7.1925 (7.0798) Prec@1 0.000 (0.260) Prec@5 0.781 (0.930)
...
* 위에서 사용된 docker image들은 다음과 같이 backup을 받아두었습니다.
root@minsky:/data/docker_save# docker save --output caffe2-ppc64le.v0.3.tar bsyu/caffe2-ppc64le:v0.3
root@minsky:/data/docker_save# docker save --output pytorch-ppc64le.v0.1.tar bsyu/pytorch-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output tf1.3-ppc64le.v0.1.tar bsyu/tf1.3-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output cudnn6-conda2-ppc64le.v0.1.tar bsyu/cudnn6-conda2-ppc64le:v0.1
root@minsky:/data/docker_save# docker save --output cudnn6-conda3-ppc64le.v0.1.tar bsyu/cudnn6-conda3-ppc64le:v0.1
root@minsky:/data/docker_save# ls -l
total 28023280
-rw------- 1 root root 4713168896 Nov 10 16:48 caffe2-ppc64le.v0.3.tar
-rw------- 1 root root 4218520064 Nov 10 17:10 cudnn6-conda2-ppc64le.v0.1.tar
-rw------- 1 root root 5272141312 Nov 10 17:11 cudnn6-conda3-ppc64le.v0.1.tar
-rw------- 1 root root 6921727488 Nov 10 16:51 pytorch-ppc64le.v0.1.tar
-rw------- 1 root root 7570257920 Nov 10 16:55 tf1.3-ppc64le.v0.1.tar
비상시엔 이 이미지들을 docker load 명령으로 load 하시면 됩니다.
(예) docker load --input caffe2-ppc64le.v0.3.tar
2017년 11월 1일 수요일
caffe2, tensorflow 1.3, pytorch가 설치된 docker image 만들기
docker hub에서 제공되는 nvidia/cuda-ppc64le 이미지를 이용하여, 거기에 이것저것 원하는 package를 설치하고 docker commit 명령을 통해 새로운 이미지를 만드는 방법을 보시겠습니다.
먼저 parent ubuntu OS에서 필요한 file들을 아래처럼 docker라는 directory에 모아둡니다.
root@firestone:~/docker# ls -l
total 2369580
-rwxr-xr-x 1 root root 284629257 Oct 31 22:27 Anaconda2-4.4.0.1-Linux-ppc64le.sh
-rwxr-xr-x 1 root root 299425582 Oct 31 22:28 Anaconda3-4.4.0.1-Linux-ppc64le.sh
-rw-r--r-- 1 root root 1321330418 Oct 31 22:35 cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb
-rwxr-xr-x 1 root root 8788 Oct 31 21:40 debootstrap.sh
-rw-r--r-- 1 root root 68444212 Oct 31 22:35 libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 59820704 Oct 31 22:35 libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 6575300 Oct 31 22:35 libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 386170568 Oct 31 22:36 mldl-repo-local_4.0.0_ppc64el.deb
drwxr-xr-x 21 root root 4096 Oct 31 21:55 ubuntu
이미 nvidia/cuda-ppc64le 이미지는 docker pull 명령으로 당겨왔습니다.
root@firestone:~/docker# docker images | grep nvidia
nvidia-docker build 405ee913a07e About an hour ago 1.02GB
nvidia/cuda-ppc64le 8.0-cudnn6-runtime-ubuntu16.04 bf28cd22ff84 6 weeks ago 974MB
nvidia/cuda-ppc64le latest 9b0a21e35c66 6 weeks ago 1.72GB
이제 nvidia/cuda-ppc64le:latest를 interactive mode로 구동합니다. 이때 docker directory를 /docker라는 이름으로 마운트합니다.
root@firestone:~/docker# docker run -ti -v ~/docker:/docker nvidia/cuda-ppc64le:latest bash
이제 nvidia/cuda-ppc64le:latest 안에 들어왔습니다. /docker로 가서 동일한 file들이 보이는지 확인합니다.
root@deeed8ce922f:/# cd /docker
root@deeed8ce922f:/docker# ls
Anaconda2-4.4.0.1-Linux-ppc64le.sh libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
Anaconda3-4.4.0.1-Linux-ppc64le.sh libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb mldl-repo-local_4.0.0_ppc64el.deb
debootstrap.sh ubuntu
libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
이제 libcudnn6을 먼저 설치합니다. 아울러 NCCL 및 bazel 등을 쓸 수도 있으니 PowerAI 4.0 (mldl-repo-local_4.0.0_ppc64el.deb)의 local repo도 설치합니다.
root@deeed8ce922f:/docker# dpkg -i libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb mldl-repo-local_4.0.0_ppc64el.deb
root@deeed8ce922f:/docker# apt-get update
이제 cuda와 nccl, openblas 등을 설치합니다.
root@deeed8ce922f:/docker# apt-get install cuda
root@deeed8ce922f:/docker# apt-get install -y libnccl-dev libnccl1 python-ncclient bazel libopenblas-dev libopenblas libopenblas-base
이번엔 다른 ssh 세션에서, parent OS에서 docker ps 명령어로 현재 우리가 쓰고 있는 container ID를 확인합니다.
root@firestone:~# docker ps | grep -v k8s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
deeed8ce922f nvidia/cuda-ppc64le:latest "bash" About an hour ago Up About an hour gracious_bhaskara
저 container ID에 대해 docker commit 명령을 다음과 같이 날립니다.
root@firestone:~# docker commit deeed8ce922f bsyu/libcudnn6-ppc64le:xenial
이제 새로운 docker image가 생성된 것을 볼 수 있습니다.
root@firestone:~# docker images | grep -v ibm
REPOSITORY TAG IMAGE ID CREATED SIZE
bsyu/libcudnn6-ppc64le xenial 6d621d9d446b 48 seconds ago 7.52GB
nvidia-docker build 405ee913a07e 2 hours ago 1.02GB
nvidia/cuda-ppc64le 8.0-cudnn6-runtime-ubuntu16.04 bf28cd22ff84 6 weeks ago 974MB
nvidia/cuda-ppc64le latest 9b0a21e35c66 6 weeks ago 1.72GB
ppc64le/golang 1.6.3 6a579d02d32f 14 months ago 705MB
적절히 tagging한 뒤, docker에 login하여 docker hub으로 push 해둡니다.
root@firestone:~# docker tag bsyu/libcudnn6-ppc64le:xenial bsyu/libcudnn6-ppc64le:latest
root@firestone:~# docker login -u bsyu
Password:
Login Succeeded
root@firestone:~# docker push bsyu/libcudnn6-ppc64le:xenial
The push refers to a repository [docker.io/bsyu/libcudnn6-ppc64le]
de3b55a17936: Pushed
9eb05620c635: Mounted from nvidia/cuda-ppc64le
688827f0a03b: Mounted from nvidia/cuda-ppc64le
a36322f4fa68: Mounted from nvidia/cuda-ppc64le
6665818dfb83: Mounted from nvidia/cuda-ppc64le
4cad4acd0601: Mounted from nvidia/cuda-ppc64le
f12b406a6a23: Mounted from nvidia/cuda-ppc64le
bb179c8bb840: Mounted from nvidia/cuda-ppc64le
cd51df595e0c: Mounted from nvidia/cuda-ppc64le
4a7a95d650cf: Mounted from nvidia/cuda-ppc64le
22c3301fbf0b: Mounted from nvidia/cuda-ppc64le
xenial: digest: sha256:3993ac50b857979694cdc41cf12d672cc078583f1babb79f6c25e0688ed603ed size: 2621
이제 여기에 추가로 caffe2를 설치합니다. 이전 포스팅(http://hwengineer.blogspot.kr/2017/10/minsky-caffe2-jupyter-notebook-mnist.html)에서 build 해두었던 /opt/caffe2 directory를 통째로 tar로 말아두었던 것을 여기에 풀겠습니다.
root@deeed8ce922f:/docker# ls
Anaconda2-4.4.0.1-Linux-ppc64le.sh libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
Anaconda3-4.4.0.1-Linux-ppc64le.sh libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
caffe2.tgz mldl-repo-local_4.0.0_ppc64el.deb
cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb site-packages.tgz
debootstrap.sh ubuntu
libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
root@deeed8ce922f:/docker# cd /opt
root@deeed8ce922f:/opt# tar -zxf /docker/caffe2.tgz
root@deeed8ce922f:/opt# vi ~/.bashrc
...
export LD_LIBRARY_PATH=/opt/DL/nccl/lib:/opt/DL/openblas/lib:/usr/local/cuda-8.0/lib6:/usr/lib:/usr/local/lib:/opt/caffe2/lib:/usr/lib/powerpc64le-linux-gnu
export PATH=/opt/anaconda2/bin:/opt/caffe2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
export PYTHONPATH=/opt/caffe2
caffe2가 정상 작동하기 위해 필요한 package들을 추가 설치합니다.
root@deeed8ce922f:/opt# conda install protobuf future
root@deeed8ce922f:/opt# apt-get install libprotobuf-dev python-protobuf libgoogle-glog-dev libopenmpi-dev liblmdb-dev python-lmdb libleveldb-dev python-leveldb libopencv-core-dev libopencv-gpu-dev python-opencv libopencv-highgui-dev libopencv-dev
이제 다시 parent OS에서 다른 이름으로 docker commit 합니다.
root@firestone:~# docker commit deeed8ce922f bsyu/caffe2-ppc64le-xenial:v0.1
이제 GPU를 사용하기 위해 nvidia-docker로 구동해봅니다. 그러자면 (혹시 아직 안 하셨다면) 먼저 nvidia-docker-plugin을 background로 구동해야 합니다.
root@firestone:~# nohup nvidia-docker-plugin &
root@firestone:~# nvidia-docker run -ti --rm -v ~/docker:/docker bsyu/caffe2-ppc64le-xenial:v0.1 bash
bsyu/caffe2-ppc64le-xenial:v0.1 컨테이너에서 caffe2가 성공적으로 import 되는 것을 확인합니다.
root@0e58f6f69c44:/# python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
먼저 parent ubuntu OS에서 필요한 file들을 아래처럼 docker라는 directory에 모아둡니다.
root@firestone:~/docker# ls -l
total 2369580
-rwxr-xr-x 1 root root 284629257 Oct 31 22:27 Anaconda2-4.4.0.1-Linux-ppc64le.sh
-rwxr-xr-x 1 root root 299425582 Oct 31 22:28 Anaconda3-4.4.0.1-Linux-ppc64le.sh
-rw-r--r-- 1 root root 1321330418 Oct 31 22:35 cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb
-rwxr-xr-x 1 root root 8788 Oct 31 21:40 debootstrap.sh
-rw-r--r-- 1 root root 68444212 Oct 31 22:35 libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 59820704 Oct 31 22:35 libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 6575300 Oct 31 22:35 libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
-rw-r--r-- 1 root root 386170568 Oct 31 22:36 mldl-repo-local_4.0.0_ppc64el.deb
drwxr-xr-x 21 root root 4096 Oct 31 21:55 ubuntu
이미 nvidia/cuda-ppc64le 이미지는 docker pull 명령으로 당겨왔습니다.
root@firestone:~/docker# docker images | grep nvidia
nvidia-docker build 405ee913a07e About an hour ago 1.02GB
nvidia/cuda-ppc64le 8.0-cudnn6-runtime-ubuntu16.04 bf28cd22ff84 6 weeks ago 974MB
nvidia/cuda-ppc64le latest 9b0a21e35c66 6 weeks ago 1.72GB
이제 nvidia/cuda-ppc64le:latest를 interactive mode로 구동합니다. 이때 docker directory를 /docker라는 이름으로 마운트합니다.
root@firestone:~/docker# docker run -ti -v ~/docker:/docker nvidia/cuda-ppc64le:latest bash
이제 nvidia/cuda-ppc64le:latest 안에 들어왔습니다. /docker로 가서 동일한 file들이 보이는지 확인합니다.
root@deeed8ce922f:/# cd /docker
root@deeed8ce922f:/docker# ls
Anaconda2-4.4.0.1-Linux-ppc64le.sh libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
Anaconda3-4.4.0.1-Linux-ppc64le.sh libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb mldl-repo-local_4.0.0_ppc64el.deb
debootstrap.sh ubuntu
libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
이제 libcudnn6을 먼저 설치합니다. 아울러 NCCL 및 bazel 등을 쓸 수도 있으니 PowerAI 4.0 (mldl-repo-local_4.0.0_ppc64el.deb)의 local repo도 설치합니다.
root@deeed8ce922f:/docker# dpkg -i libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb mldl-repo-local_4.0.0_ppc64el.deb
root@deeed8ce922f:/docker# apt-get update
이제 cuda와 nccl, openblas 등을 설치합니다.
root@deeed8ce922f:/docker# apt-get install cuda
root@deeed8ce922f:/docker# apt-get install -y libnccl-dev libnccl1 python-ncclient bazel libopenblas-dev libopenblas libopenblas-base
이번엔 다른 ssh 세션에서, parent OS에서 docker ps 명령어로 현재 우리가 쓰고 있는 container ID를 확인합니다.
root@firestone:~# docker ps | grep -v k8s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
deeed8ce922f nvidia/cuda-ppc64le:latest "bash" About an hour ago Up About an hour gracious_bhaskara
저 container ID에 대해 docker commit 명령을 다음과 같이 날립니다.
root@firestone:~# docker commit deeed8ce922f bsyu/libcudnn6-ppc64le:xenial
이제 새로운 docker image가 생성된 것을 볼 수 있습니다.
root@firestone:~# docker images | grep -v ibm
REPOSITORY TAG IMAGE ID CREATED SIZE
bsyu/libcudnn6-ppc64le xenial 6d621d9d446b 48 seconds ago 7.52GB
nvidia-docker build 405ee913a07e 2 hours ago 1.02GB
nvidia/cuda-ppc64le 8.0-cudnn6-runtime-ubuntu16.04 bf28cd22ff84 6 weeks ago 974MB
nvidia/cuda-ppc64le latest 9b0a21e35c66 6 weeks ago 1.72GB
ppc64le/golang 1.6.3 6a579d02d32f 14 months ago 705MB
적절히 tagging한 뒤, docker에 login하여 docker hub으로 push 해둡니다.
root@firestone:~# docker tag bsyu/libcudnn6-ppc64le:xenial bsyu/libcudnn6-ppc64le:latest
root@firestone:~# docker login -u bsyu
Password:
Login Succeeded
root@firestone:~# docker push bsyu/libcudnn6-ppc64le:xenial
The push refers to a repository [docker.io/bsyu/libcudnn6-ppc64le]
de3b55a17936: Pushed
9eb05620c635: Mounted from nvidia/cuda-ppc64le
688827f0a03b: Mounted from nvidia/cuda-ppc64le
a36322f4fa68: Mounted from nvidia/cuda-ppc64le
6665818dfb83: Mounted from nvidia/cuda-ppc64le
4cad4acd0601: Mounted from nvidia/cuda-ppc64le
f12b406a6a23: Mounted from nvidia/cuda-ppc64le
bb179c8bb840: Mounted from nvidia/cuda-ppc64le
cd51df595e0c: Mounted from nvidia/cuda-ppc64le
4a7a95d650cf: Mounted from nvidia/cuda-ppc64le
22c3301fbf0b: Mounted from nvidia/cuda-ppc64le
xenial: digest: sha256:3993ac50b857979694cdc41cf12d672cc078583f1babb79f6c25e0688ed603ed size: 2621
이제 여기에 추가로 caffe2를 설치합니다. 이전 포스팅(http://hwengineer.blogspot.kr/2017/10/minsky-caffe2-jupyter-notebook-mnist.html)에서 build 해두었던 /opt/caffe2 directory를 통째로 tar로 말아두었던 것을 여기에 풀겠습니다.
root@deeed8ce922f:/docker# ls
Anaconda2-4.4.0.1-Linux-ppc64le.sh libcudnn6-doc_6.0.21-1+cuda8.0_ppc64el.deb
Anaconda3-4.4.0.1-Linux-ppc64le.sh libcudnn6_6.0.21-1+cuda8.0_ppc64el.deb
caffe2.tgz mldl-repo-local_4.0.0_ppc64el.deb
cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb site-packages.tgz
debootstrap.sh ubuntu
libcudnn6-dev_6.0.21-1+cuda8.0_ppc64el.deb
root@deeed8ce922f:/docker# cd /opt
root@deeed8ce922f:/opt# tar -zxf /docker/caffe2.tgz
root@deeed8ce922f:/opt# vi ~/.bashrc
...
export LD_LIBRARY_PATH=/opt/DL/nccl/lib:/opt/DL/openblas/lib:/usr/local/cuda-8.0/lib6:/usr/lib:/usr/local/lib:/opt/caffe2/lib:/usr/lib/powerpc64le-linux-gnu
export PATH=/opt/anaconda2/bin:/opt/caffe2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
export PYTHONPATH=/opt/caffe2
caffe2가 정상 작동하기 위해 필요한 package들을 추가 설치합니다.
root@deeed8ce922f:/opt# conda install protobuf future
root@deeed8ce922f:/opt# apt-get install libprotobuf-dev python-protobuf libgoogle-glog-dev libopenmpi-dev liblmdb-dev python-lmdb libleveldb-dev python-leveldb libopencv-core-dev libopencv-gpu-dev python-opencv libopencv-highgui-dev libopencv-dev
이제 다시 parent OS에서 다른 이름으로 docker commit 합니다.
root@firestone:~# docker commit deeed8ce922f bsyu/caffe2-ppc64le-xenial:v0.1
이제 GPU를 사용하기 위해 nvidia-docker로 구동해봅니다. 그러자면 (혹시 아직 안 하셨다면) 먼저 nvidia-docker-plugin을 background로 구동해야 합니다.
root@firestone:~# nohup nvidia-docker-plugin &
root@firestone:~# nvidia-docker run -ti --rm -v ~/docker:/docker bsyu/caffe2-ppc64le-xenial:v0.1 bash
bsyu/caffe2-ppc64le-xenial:v0.1 컨테이너에서 caffe2가 성공적으로 import 되는 것을 확인합니다.
root@0e58f6f69c44:/# python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
이 이미지를 이용하여 또 tensorflow 1.3 및 pytorch 0.2.0이 들어간 docker image도 만듭니다.
root@firestone:~# docker run -ti --rm -v ~/docker:/docker bsyu/caffe2-ppc64le-xenial:v0.1 bash
root@8cfeaf93f28b:/# cd /opt
root@8cfeaf93f28b:/opt# ls
DL anaconda2 anaconda3 caffe2
root@8cfeaf93f28b:/opt# rm -rf caffe2
root@8cfeaf93f28b:/opt# vi ~/.bashrc
...
export LD_LIBRARY_PATH=/opt/DL/nccl/lib:/opt/DL/openblas/lib:/usr/local/cuda-8.0/lib6:/usr/lib:/usr/local/lib:/usr/lib/powerpc64le-linux-gnu
export PATH=/opt/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
export PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages
root@8cfeaf93f28b:~# apt-get install libcupti-dev openjdk-8-jdk openjdk-8-jdk-headless git
root@8cfeaf93f28b:~# conda install bazel numpy
root@8cfeaf93f28b:~# git clone --recursive https://github.com/tensorflow/tensorflow.git
root@8cfeaf93f28b:~# cd tensorflow/
root@8cfeaf93f28b:~/tensorflow# git checkout r1.3
root@8cfeaf93f28b:~/tensorflow# ./configure
root@8cfeaf93f28b:~/tensorflow# bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
root@8cfeaf93f28b:~/tensorflow# bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
root@8cfeaf93f28b:~/tensorflow# pip install /tmp/tensorflow_pkg/tensorflow-1.3.1-cp36-cp36m-linux_ppc64le.whl
root@8cfeaf93f28b:~/tensorflow# conda list | grep tensor
tensorflow 1.3.1 <pip>
tensorflow-tensorboard 0.1.8 <pip>
이제 tensorflow 1.3이 설치되었으므로 이를 docker commit으로 저장합니다.
root@firestone:~# docker ps | grep -v k8s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8cfeaf93f28b bsyu/caffe2-ppc64le-xenial:v0.1 "bash" 2 hours ago Up 2 hours vigilant_ptolemy
root@firestone:~# docker commit 8cfeaf93f28b bsyu/tf1.3-caffe2-ppc64le-xenial:v0.1
이 이미지에 다시 pytorch를 설치합니다.
root@8cfeaf93f28b:~# git clone --recursive https://github.com/pytorch/pytorch.git
root@8cfeaf93f28b:~# cd /pytorch
root@8cfeaf93f28b:~/pytorch# export CMAKE_PREFIX_PATH=/opt/pytorch
root@8cfeaf93f28b:~/pytorch# conda install numpy pyyaml setuptools cmake cffi openblas
root@8cfeaf93f28b:~/pytorch# python setup.py install
root@8cfeaf93f28b:~# python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 15:31:35)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import print_function
>>> import torch
>>>
이제 최종적으로 bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1 라는 이미지를 commit 합니다.
root@firestone:~# docker commit 8cfeaf93f28b bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1
root@firestone:~# docker push bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1
root@firestone:~# docker run -ti --rm -v ~/docker:/docker bsyu/caffe2-ppc64le-xenial:v0.1 bash
root@8cfeaf93f28b:/# cd /opt
root@8cfeaf93f28b:/opt# ls
DL anaconda2 anaconda3 caffe2
root@8cfeaf93f28b:/opt# rm -rf caffe2
root@8cfeaf93f28b:/opt# vi ~/.bashrc
...
export LD_LIBRARY_PATH=/opt/DL/nccl/lib:/opt/DL/openblas/lib:/usr/local/cuda-8.0/lib6:/usr/lib:/usr/local/lib:/usr/lib/powerpc64le-linux-gnu
export PATH=/opt/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
export PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages
root@8cfeaf93f28b:~# apt-get install libcupti-dev openjdk-8-jdk openjdk-8-jdk-headless git
root@8cfeaf93f28b:~# conda install bazel numpy
root@8cfeaf93f28b:~# git clone --recursive https://github.com/tensorflow/tensorflow.git
root@8cfeaf93f28b:~# cd tensorflow/
root@8cfeaf93f28b:~/tensorflow# git checkout r1.3
root@8cfeaf93f28b:~/tensorflow# ./configure
root@8cfeaf93f28b:~/tensorflow# bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
root@8cfeaf93f28b:~/tensorflow# bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
root@8cfeaf93f28b:~/tensorflow# pip install /tmp/tensorflow_pkg/tensorflow-1.3.1-cp36-cp36m-linux_ppc64le.whl
root@8cfeaf93f28b:~/tensorflow# conda list | grep tensor
tensorflow 1.3.1 <pip>
tensorflow-tensorboard 0.1.8 <pip>
이제 tensorflow 1.3이 설치되었으므로 이를 docker commit으로 저장합니다.
root@firestone:~# docker ps | grep -v k8s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8cfeaf93f28b bsyu/caffe2-ppc64le-xenial:v0.1 "bash" 2 hours ago Up 2 hours vigilant_ptolemy
root@firestone:~# docker commit 8cfeaf93f28b bsyu/tf1.3-caffe2-ppc64le-xenial:v0.1
이 이미지에 다시 pytorch를 설치합니다.
root@8cfeaf93f28b:~# git clone --recursive https://github.com/pytorch/pytorch.git
root@8cfeaf93f28b:~# cd /pytorch
root@8cfeaf93f28b:~/pytorch# export CMAKE_PREFIX_PATH=/opt/pytorch
root@8cfeaf93f28b:~/pytorch# conda install numpy pyyaml setuptools cmake cffi openblas
root@8cfeaf93f28b:~/pytorch# python setup.py install
root@8cfeaf93f28b:~# python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 15:31:35)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import print_function
>>> import torch
>>>
이제 최종적으로 bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1 라는 이미지를 commit 합니다.
root@firestone:~# docker commit 8cfeaf93f28b bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1
root@firestone:~# docker push bsyu/pytorch-tf1.3-caffe2-ppc64le-xenial:v0.1
2017년 10월 25일 수요일
Minsky에서의 Caffe2 설치 및 jupyter notebook에서 MNIST 돌려보기
PyTorch와 함께 Caffe2를 찾으시는 고객분들도 점점 늘고 계십니다. PyTorch처럼 Caffe2도 아직 PowerAI 속에 포함되어 있지는 않지만, 그래도 ppc64le 아키텍처에서도 소스에서 빌드하면 잘 됩니다. 현재의 최신 Caffe2 소스코드에서는 딱 한군데, benchmark 관련 모듈에서 error가 나는 부분이 있는데, 거기에 대해서 IBM이 patch를 제공합니다. 아래를 보시고 따라 하시면 됩니다.
먼저, 기본적인 caffe2의 build는 caffe2 홈페이지에 나온 안내(아래 링크)를 그대로 따르시면 됩니다.
https://caffe2.ai/docs/getting-started.html?platform=ubuntu&configuration=compile
저는 아래와 같은 환경에서 빌드했습니다. CUDA 8.0과 libcuDNN이 설치된 Ubuntu 16.04 ppc64le 환경입니다. 특기할 부분으로는, caffe2는 Anaconda와는 뭔가 맞지 않습니다. Anaconda가 혹시 설치되어 있다면 관련 PATH를 모두 unset하시고 빌드하시기 바랍니다.
u0017649@sys-89697:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
u0017649@sys-89697:~$ dpkg -l | grep cuda | head -n 4
ii cuda 8.0.61-1 ppc64el CUDA meta-package
ii cuda-8-0 8.0.61-1 ppc64el CUDA 8.0 meta-package
ii cuda-command-line-tools-8-0 8.0.61-1 ppc64el CUDA command-line tools
ii cuda-core-8-0 8.0.61-1 ppc64el CUDA core tools
u0017649@sys-89697:~$ dpkg -l | grep libcudnn
ii libcudnn6 6.0.21-1+cuda8.0 ppc64el cuDNN runtime libraries
ii libcudnn6-dev 6.0.21-1+cuda8.0 ppc64el cuDNN development libraries and headers
위에서 언급한 대로, python이든 pip든 모두 Anaconda가 아닌, OS에서 제공하는 것을 사용하십시요. 여기서는 python 2.7을 썼습니다.
u0017649@sys-89697:~$ which python
/usr/bin/python
먼저, 기본적으로 아래의 Ubuntu OS package들을 설치하십시요.
u0017649@sys-89697:~$ sudo apt-get install -y --no-install-recommends build-essential cmake git libgoogle-glog-dev libprotobuf-dev protobuf-compiler
u0017649@sys-89697:~$ sudo apt-get install -y --no-install-recommends libgtest-dev libiomp-dev libleveldb-dev liblmdb-dev libopencv-dev libopenmpi-dev libsnappy-dev openmpi-bin openmpi-doc python-pydot libgflags-dev
그리고 PowerAI에서 제공하는 openblas와 nccl도 설치하십시요.
u0017649@sys-89697:~$ sudo dpkg -i mldl-repo-local_4.0.0_ppc64el.deb
u0017649@sys-89697:~$ sudo apt-get update
u0017649@sys-89697:~$ sudo apt-get install libnccl-dev
u0017649@sys-89697:~$ sudo apt-get install libopenblas-dev libopenblas
그리고 아래의 pip package들을 설치하십시요.
u0017649@sys-89697:~$ pip install flask future graphviz hypothesis jupyter matplotlib pydot python-nvd3 pyyaml requests scikit-image scipy setuptools six tornado protobuf
이제 caffe2의 source를 download 받습니다.
u0017649@sys-89697:~$ git clone --recursive https://github.com/caffe2/caffe2.git
Cloning into 'caffe2'...
remote: Counting objects: 36833, done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 36833 (delta 37), reused 34 (delta 12), pack-reused 36757
Receiving objects: 100% (36833/36833), 149.17 MiB | 11.42 MiB/s, done.
Resolving deltas: 100% (26960/26960), done.
Checking connectivity... done.
Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/NNPACK_deps/FP16'
...
remote: Counting objects: 353, done.
remote: Total 353 (delta 0), reused 0 (delta 0), pack-reused 353
Receiving objects: 100% (353/353), 119.74 KiB | 0 bytes/s, done.
Resolving deltas: 100% (149/149), done.
Checking connectivity... done.
Submodule path 'third_party/pybind11/tools/clang': checked out '254c7a91e3c6aa254e113197604dafb443f4d429'
이제 ppc64le를 위한 source 수정을 하나 하겠습니다. Patch가 필요한 부분은 주요 부분은 아니고, 벤치마크를 위한 third party 모듈 중 하나입니다.
u0017649@sys-89697:~$ cd caffe2/third_party/benchmark/src
u0017649@sys-89697:~/caffe2/third_party/benchmark/src$ vi cycleclock.h
...
#elif defined(__powerpc__) || defined(__ppc__)
// This returns a time-base, which is not always precisely a cycle-count.
int64_t tbl, tbu0, tbu1;
asm("mftbu %0" : "=r"(tbu0));
asm("mftb %0" : "=r"(tbl));
asm("mftbu %0" : "=r"(tbu1));
tbl &= -static_cast<long long>(tbu0 == tbu1);
// tbl &= -static_cast<int64_t>(tbu0 == tbu1);
수정의 핵심은 저 int64 대신 long long을 넣는 것입니다.
** 그 사이에 하나 더 늘었네요. 아래와 같이 third_party/NNPACK/CMakeLists.txt의 30번째 줄에 ppc64le를 넣어줍니다.
u0017649@sys-89697:~/caffe2/build# vi ../third_party/NNPACK/CMakeLists.txt
...
ELSEIF(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "^(i686|x86_64|armv5te|armv7-a|armv7l|aarch64|ppc64le)$")
MESSAGE(FATAL_ERROR "Unrecognized CMAKE_SYSTEM_PROCESSOR = ${CMAKE_SYSTEM_PROCESSOR}")
...
그 다음으로는 나중에 cblas 관련 error를 없애기 위해, 아래와 같이 Dependencies.cmake 파일에서 cblas 관련 1줄을 comment-out 시킵니다.
u0017649@sys-89697:~/caffe2/cmake$ vi Dependencies.cmake
...
elseif(BLAS STREQUAL "OpenBLAS")
find_package(OpenBLAS REQUIRED)
caffe2_include_directories(${OpenBLAS_INCLUDE_DIR})
list(APPEND Caffe2_DEPENDENCY_LIBS ${OpenBLAS_LIB})
# list(APPEND Caffe2_DEPENDENCY_LIBS cblas)
...
그리고 OpenBLAS_HOME을 아래와 같이 PowerAI에서 제공하는 openblas로 설정합니다.
u0017649@sys-89697:~/caffe2/cmake$ export OpenBLAS_HOME=/opt/DL/openblas/
또 하나 고칩니다. FindNCCL.cmake에서 아래 부분을 PowerAI의 NCCL directory로 명기하면 됩니다.
root@26aa285b6c46:/data/imsi/caffe2/cmake# vi Modules/FindNCCL.cmake
...
#set(NCCL_ROOT_DIR "" CACHE PATH "Folder contains NVIDIA NCCL")
set(NCCL_ROOT_DIR "/opt/DL/nccl" CACHE PATH "Folder contains NVIDIA NCCL")
...
이제 cmake 이후 make를 돌리면 됩니다. 단, default인 /usr/local이 아닌, /opt/caffe2에 caffe2 binary가 설치되도록 CMAKE_INSTALL_PREFIX 옵션을 줍니다.
u0017649@sys-89697:~$ cd caffe2
u0017649@sys-89697:~/caffe2$ mkdir build
u0017649@sys-89697:~/caffe2$ cd build
u0017649@sys-89697:~/caffe2/build$ cmake -DCMAKE_INSTALL_PREFIX=/opt/caffe2 ..
u0017649@sys-89697:~/caffe2/build$ make
약 1시간 넘게 꽤 긴 시간이 걸린 뒤에 make가 완료됩니다. 이제 make install을 하면 /opt/caffe2 밑으로 binary들이 설치됩니다.
u0017649@sys-89697:~/caffe2/build$ sudo make install
u0017649@sys-89697:~/caffe2/build$ ls -l /opt/caffe2
total 28
drwxr-xr-x 2 root root 4096 Oct 25 03:16 bin
drwxr-xr-x 3 root root 4096 Oct 25 03:16 caffe
drwxr-xr-x 24 root root 4096 Oct 25 03:16 caffe2
drwxr-xr-x 4 root root 4096 Oct 25 03:16 include
drwxr-xr-x 2 root root 4096 Oct 25 03:16 lib
drwxr-xr-x 3 root root 4096 Oct 25 03:16 share
drwxr-xr-x 2 root root 4096 Oct 25 03:16 test
다른 일반 user들도 사용할 수 있도록 필요에 따라 /opt/caffe2의 ownership을 바꿔줍니다. (optional)
u0017649@sys-89697:~/caffe2/build$ sudo chown -R u0017649:u0017649 /opt/caffe2
이제 테스트를 해봅니다. 먼저, PYTHONPATH를 설정해줍니다.
u0017649@sys-89697:~/caffe2/build$ export PYTHONPATH=/opt/caffe2
다음과 같이 기본 테스트를 2가지 해봅니다. 먼저 import core를 해봅니다.
u0017649@sys-89697:~/caffe2/build$ python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
Success
다음으로는 operator_test.relu_op_test를 수행해봅니다. 제가 가진 환경은 GPU가 안 달린 환경이라서 아래 붉은색의 warning이 있습니다만, 여기서는 무시하십시요.
u0017649@sys-89697:~/caffe2/build$ pip install hypothesis
u0017649@sys-89697:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/caffe2/lib
u0017649@sys-89697:~$ python -m caffe2.python.operator_test.relu_op_test
NVIDIA: no NVIDIA devices found
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1025 05:09:33.528787 12222 common_gpu.cc:70] Found an unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. I will set the available devices to be zero.
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.], dtype=float32), gc=, dc=[], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([[[-0.03770517, -0.03770517, -0.03770517, -0.03770517],
[-0.03770517, -0.03770517, -0.03770517, -0.03770517],
[-0.03770517, -0.03770517, -0.03770517, -0.03770517]],
...
[ 0.96481699, 0.96481699, 0.96481699, 0.96481699],
[ 0.96481699, 0.96481699, 0.96481699, -0.74859387]]], dtype=float32), gc=, dc=[], engine=u'')
Trying example: test_relu(self=<__main__.TestRelu testMethod=test_relu>, X=array([ 0.61015409], dtype=float32), gc=, dc=[], engine=u'CUDNN')
.
----------------------------------------------------------------------
Ran 1 test in 0.957s
OK
테스트도 정상적으로 완료되었습니다. 이제 jupyter notebook에서 MNIST를 수행해보시지요.
먼저 jupyter notebook을 외부 browser에서 접속할 수 있도록 config를 수정합니다.
u0017649@sys-89697:~$ jupyter notebook --generate-config
Writing default config to: /home/u0017649/.jupyter/jupyter_notebook_config.py
u0017649@sys-89697:~$ vi /home/u0017649/.jupyter/jupyter_notebook_config.py
...
#c.NotebookApp.ip = 'localhost'
c.NotebookApp.ip = '*'
jupyter notebook을 kill 했다가 다시 띄웁니다. 이제 아래 붉은색처럼 token이 display 됩니다.
u0017649@sys-89697:~$ jupyter notebook &
[I 05:13:29.492 NotebookApp] Writing notebook server cookie secret to /home/u0017649/.local/share/jupyter/runtime/notebook_cookie_secret
[W 05:13:29.578 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 05:13:29.665 NotebookApp] Serving notebooks from local directory: /home/u0017649
[I 05:13:29.666 NotebookApp] 0 active kernels
[I 05:13:29.666 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/?token=ca44acc48f0f2daa6dc9d935904f1de9a1496546efc95768
[I 05:13:29.666 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 05:13:29.667 NotebookApp] No web browser found: could not locate runnable browser.
[C 05:13:29.667 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=ca44acc48f0f2daa6dc9d935904f1de9a1496546efc95768
이 token을 복사해서 외부의 browser에서 입력하면 됩니다. 여기서 사용된 서버의 ip가 172.29.160.94이므로, 먼저 browser에서 http://172.29.160.94:8888의 주소를 찾아들어갑니다.
접속이 완료되면 다시 명령어 창에서 아래의 MNIST python notebook을 download 받습니다.
u0017649@sys-89697:~$ wget https://raw.githubusercontent.com/caffe2/caffe2/master/caffe2/python/tutorials/MNIST.ipynb
이 MNIST.ipynb이 jupyter 화면에 뜨면, 그걸 클릭합니다.
이제 문단 하나하나씩 play button을 누르면서 진행해보실 수 있습니다.
피드 구독하기:
글 (Atom)