2017년 5월 15일 월요일

Minsky 서버에 Continuum 아나콘다(Anaconda) 설치하기 + Tensorflow로 inception v3 training 해보기

Continuum에서 내놓은 아나콘다(Anaconda)는 여태까지는 x86용으로만 존재했으나, 최근 ARM processor와 POWER8 processor를 위한 min-conda를 내놓았습니다.   아래 site에 해당 package를 download 받기 위한 link와 설치 방법이 정리되어 있습니다.

https://www.continuum.io/content/conda-support-raspberry-pi-2-and-power8-le

불행히도 ppc64le를 위한 link는 잘못 지정되어 있어 '404 Not Found'가 나옵니다만, 실제로는 link만 잘못된 것이고 아래와 같이 file은 실제로 존재합니다.   위의 것은 python version2 용이고, 아래 것은 python 3용입니다.

https://repo.continuum.io/miniconda/Miniconda2-4.3.14-Linux-ppc64le.sh
https://repo.continuum.io/miniconda/Miniconda3-4.3.14-Linux-ppc64le.sh


여기서는 python 3용을 설치해보겠습니다.

u0017496@sys-87250:~$ wget https://repo.continuum.io/miniconda/Miniconda3-4.3.14-Linux-ppc64le.sh
--2017-05-14 21:54:26--  https://repo.continuum.io/miniconda/Miniconda3-4.3.14-Linux-ppc64le.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.16.19.10, 104.16.18.10, 2400:cb00:2048:1::6810:120a, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.16.19.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 34765794 (33M) [application/x-sh]
Saving to: ‘Miniconda3-4.3.14-Linux-ppc64le.sh’

Miniconda3-4.3.14-Linux-pp 100%[=====================================>]  33.15M  10.2MB/s    in 3.3s

2017-05-17 22:05:18 (10.0 MB/s) - ‘Miniconda3-4.3.14-Linux-ppc64le.sh’ saved [34765794/34765794]


u0017496@sys-87250:~$ chmod a+x Miniconda3-4.3.14-Linux-ppc64le.sh

이제 이 shell script를 수행하면 license 동의 등에 답을 해야 하며, 그 외에도 여러가지 입력 값을 넣어야 합니다.  대부분 그냥 enter를 누르시면 됩니다.

u0017496@sys-87250:~$ ./Miniconda3-4.3.14-Linux-ppc64le.sh

Welcome to Miniconda3 4.3.14 (by Continuum Analytics, Inc.)

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>>

(중략)

[/home/u0017496/miniconda3] >>>
PREFIX=/home/u0017496/miniconda3
installing: python-3.6.0-0 ...
installing: cffi-1.9.1-py36_0 ...
installing: conda-env-2.6.0-0 ...
installing: cryptography-1.7.1-py36_0 ...
installing: idna-2.2-py36_0 ...
installing: libffi-3.2.1-1 ...
installing: openssl-1.0.2k-1 ...
installing: pyasn1-0.2.3-py36_0 ...
installing: pycosat-0.6.2-py36_0 ...
installing: pycparser-2.17-py36_0 ...
installing: pyopenssl-16.2.0-py36_0 ...
installing: requests-2.13.0-py36_0 ...
installing: ruamel_yaml-0.11.14-py36_1 ...
installing: setuptools-27.2.0-py36_0 ...
installing: six-1.10.0-py36_0 ...
installing: sqlite-3.13.0-0 ...
installing: xz-5.2.2-1 ...
installing: yaml-0.1.6-0 ...
installing: zlib-1.2.8-3 ...
installing: conda-4.3.14-py36_0 ...
installing: pip-9.0.1-py36_1 ...
installing: wheel-0.29.0-py36_0 ...
Python 3.6.0 :: Continuum Analytics, Inc.
creating default environment...
installation finished.
Do you wish the installer to prepend the Miniconda3 install location
to PATH in your /home/u0017496/.bashrc ? [yes|no]

[no] >>> yes

Prepending PATH=/home/u0017496/miniconda3/bin to PATH in /home/u0017496/.bashrc
A backup will be made to: /home/u0017496/.bashrc-miniconda3.bak


For this change to become active, you have to open a new terminal.

Thank you for installing Miniconda2!

Share your notebooks and packages on Anaconda Cloud!
Sign up for free: https://anaconda.org


이제 mini-conda를 설치했으니 conda 명령을 쓸 수 있어야 합니다.  그러나 보시다시피 conda가 없습니다.

u0017496@sys-87250:~$ which conda

이는 mini-conda 설치시 ~/.bashrc에 conda의 PATH 정보가 자동으로 들어가긴 했지만 .bashrc가 수행되지 않았기 때문에 그런 것입니다.  수행하시면 conda의 PATH가 잡혀 있는 것을 보실 수 있습니다.

u0017496@sys-87250:~$ . ~/.bashrc
u0017496@sys-87250:~$ which conda
/home/u0017496/miniconda3/bin/conda

이제 다음과 같이 ananconda에 포함된 python library들을 보실 수 있습니다.

u0017496@sys-87250:~$ conda list
# packages in environment at /home/u0017496/miniconda3:
#
cffi                      1.9.1                    py36_0
conda                     4.3.14                   py36_0
conda-env                 2.6.0                         0
cryptography              1.7.1                    py36_0
idna                      2.2                      py36_0
libffi                    3.2.1                         1
openssl                   1.0.2k                        1
pip                       9.0.1                    py36_1
pyasn1                    0.2.3                    py36_0
pycosat                   0.6.2                    py36_0
pycparser                 2.17                     py36_0
pyopenssl                 16.2.0                   py36_0
python                    3.6.0                         0
requests                  2.13.0                   py36_0
ruamel_yaml               0.11.14                  py36_1
setuptools                27.2.0                   py36_0
six                       1.10.0                   py36_0
sqlite                    3.13.0                        0
wheel                     0.29.0                   py36_0
xz                        5.2.2                         1
yaml                      0.1.6                         0
zlib                      1.2.8                         3


현재까지 Continuum에서 빌드해놓은 package들을 모조리 다 설치하는 명령을 다음에 정리했습니다.

u0017496@sys-87250:~$ for i in `conda list | awk '{print $1}' | grep -v \#`
> do
> conda install $i
> done

(중략)

이제 위에서 설치한 패키지 중 pip가 제대로 설치되었는지 conda search로 확인해보겠습니다.  아래와 같이 * 표시가 된 것이 설치된 것입니다.

u0017496@sys-87250:~$ conda search pip
Fetching package metadata .........
pip                          7.1.0                    py27_0  defaults
                             7.1.0                    py34_0  defaults
                             7.1.0                    py27_1  defaults
                             7.1.0                    py34_1  defaults
                             7.1.2                    py27_0  defaults
                             7.1.2                    py34_0  defaults
                             8.1.0                    py27_0  defaults
                             8.1.0                    py34_0  defaults
                             8.1.0                    py35_0  defaults
                             8.1.2                    py27_0  defaults
                             8.1.2                    py34_0  defaults
                             8.1.2                    py35_0  defaults
                             9.0.0                    py27_0  defaults
                             9.0.0                    py34_0  defaults
                             9.0.0                    py35_0  defaults
                             9.0.1                    py27_1  defaults
                             9.0.1                    py35_1  defaults
                          *  9.0.1                    py36_1  defaults

u0017496@sys-87250:~$ which pip
/home/u0017496/miniconda3/bin/pip

u0017496@sys-87250:~$ pip --version
pip 9.0.1 from /home/u0017496/miniconda3/lib/python3.6/site-packages (python 3.6)

이렇게 conda에서 제공하는 pip로 keras 2.0.4를 설치해보겠습니다.

u0017496@sys-87250:~/miniconda3/lib$ pip install keras==2.0.4
Collecting keras==2.0.4
  Downloading Keras-2.0.4.tar.gz (199kB)
    100% |████████████████████████████████| 204kB 3.1MB/s
Collecting theano (from keras==2.0.4)
  Downloading Theano-0.9.0.tar.gz (3.1MB)
    100% |████████████████████████████████| 3.1MB 310kB/s
Collecting pyyaml (from keras==2.0.4)
  Downloading PyYAML-3.12.tar.gz (253kB)
    100% |████████████████████████████████| 256kB 3.6MB/s
Requirement already satisfied: six in ./python3.6/site-packages (from keras==2.0.4)
Requirement already satisfied: numpy>=1.9.1 in ./python3.6/site-packages (from theano->keras==2.0.4)
Requirement already satisfied: scipy>=0.14 in ./python3.6/site-packages (from theano->keras==2.0.4)
Building wheels for collected packages: keras, theano, pyyaml
  Running setup.py bdist_wheel for keras ... done
  Stored in directory: /home/u0017496/.cache/pip/wheels/48/82/42/f06a8c03a8f95ada523a81ba723e89f059693e6ad868d09727
  Running setup.py bdist_wheel for theano ... done
  Stored in directory: /home/u0017496/.cache/pip/wheels/d5/5b/93/433299b86e3e9b25f0f600e4e4ebf18e38eb7534ea518eba13
  Running setup.py bdist_wheel for pyyaml ... done
  Stored in directory: /home/u0017496/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
Successfully built keras theano pyyaml
Installing collected packages: theano, pyyaml, keras
Successfully installed keras-2.0.4 pyyaml-3.12 theano-0.9.0


또 gensim 2.0.0과 KoNLPy를 설치해보겠습니다.

u0017496@sys-87250:~$ pip install gensim==2.0.0
Collecting gensim==2.0.0
  Downloading gensim-2.0.0.tar.gz (14.1MB)
    100% |████████████████████████████████| 14.2MB 88kB/s
Requirement already satisfied: numpy>=1.3 in ./miniconda3/lib/python3.6/site-packages (from gensim==2.0.0)
Requirement already satisfied: scipy>=0.7.0 in ./miniconda3/lib/python3.6/site-packages (from gensim==2.0.0)
Requirement already satisfied: six>=1.5.0 in ./miniconda3/lib/python3.6/site-packages (from gensim==2.0.0)
Requirement already satisfied: smart_open>=1.2.1 in ./miniconda3/lib/python3.6/site-packages (from gensim==2.0.0)
Requirement already satisfied: boto>=2.32 in ./miniconda3/lib/python3.6/site-packages (from smart_open>=1.2.1->gensim==2.0.0)
Requirement already satisfied: bz2file in ./miniconda3/lib/python3.6/site-packages (from smart_open>=1.2.1->gensim==2.0.0)
Requirement already satisfied: requests in ./miniconda3/lib/python3.6/site-packages (from smart_open>=1.2.1->gensim==2.0.0)
Building wheels for collected packages: gensim
  Running setup.py bdist_wheel for gensim ... done
  Stored in directory: /home/u0017496/.cache/pip/wheels/e9/5f/e7/4ff23a3fe4b181b44f37eed5602f179c1cc92a0a34f337e745
Successfully built gensim
Installing collected packages: gensim
  Found existing installation: gensim 1.0.1
    Uninstalling gensim-1.0.1:
      Successfully uninstalled gensim-1.0.1
Successfully installed gensim-2.0.0


u0017496@sys-87250:~$ pip install konlpy
Collecting konlpy
  Downloading konlpy-0.4.4-py2.py3-none-any.whl (22.5MB)
    100% |████████████████████████████████| 22.5MB 57kB/s
Installing collected packages: konlpy
Successfully installed konlpy-0.4.4



이제 conda 명령을 통해 추가로 numpy와 matplotlib, scipy와 scikit-learn를 설치해보겠습니다.   matplotlib의 prerequisite이 numpy이고, scikit-learn의 prerequisite이 scipy라서 그것들은 자동으로 설치되니까, 실제로는 conda 명령은 두번만 쓰면 됩니다.


u0017496@sys-87250:~$ conda install matplotlib
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda3:

The following NEW packages will be INSTALLED:

    cycler:          0.10.0-py36_0
    freetype:        2.5.5-2
    libpng:          1.6.27-0
    matplotlib:      2.0.2-np112py36_0
    numpy:           1.12.1-py36_0
    openblas:        0.2.19-0
    python-dateutil: 2.6.0-py36_0
    pytz:            2017.2-py36_0

Proceed ([y]/n)? y

openblas-0.2.1 100% |###########################################################| Time: 0:00:00  10.21 MB/s
libpng-1.6.27- 100% |###########################################################| Time: 0:00:00  12.75 MB/s
freetype-2.5.5 100% |###########################################################| Time: 0:00:00  10.53 MB/s
numpy-1.12.1-p 100% |###########################################################| Time: 0:00:00  15.12 MB/s
pytz-2017.2-py 100% |###########################################################| Time: 0:00:00  13.25 MB/s
cycler-0.10.0- 100% |###########################################################| Time: 0:00:00  15.61 MB/s
python-dateuti 100% |###########################################################| Time: 0:00:00   6.43 MB/s
matplotlib-2.0 100% |###########################################################| Time: 0:00:00  14.62 MB/s


u0017496@sys-87250:~$ conda install scikit-learn
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda3:

The following NEW packages will be INSTALLED:

    scikit-learn: 0.18.1-np112py36_1
    scipy:        0.19.0-np112py36_0

Proceed ([y]/n)? y

scipy-0.19.0-n 100% |###########################################################| Time: 0:00:02  14.75 MB/s
scikit-learn-0 100% |###########################################################| Time: 0:00:00  15.56 MB/s



이렇게 설치된 것들은 아래와 같이 /home/u0017496/miniconda3/lib/python3.6/site-packages 에 들어갑니다.

u0017496@sys-87250:~$ ls /home/u0017496/miniconda3/lib/python3.6/site-packages/
asn1crypto                         mpl_toolkits                                  python_dateutil-2.6.0-py3.6.egg-info
asn1crypto-0.22.0-py3.6.egg-info   numpy-1.12.1.dist-info                        pytz
cffi                               OpenSSL                                       pytz-2017.2-py3.6.egg-info
cffi-1.10.0-py3.6.egg-info         packaging                                     README.txt
_cffi_backend.so                   packaging-16.8-py3.6.egg-info                 requests
conda                              pip                                           requests-2.14.2-py3.6.egg-info
conda-4.3.18-py3.6.egg-info        pip-9.0.1-py3.6.egg-info                      ruamel_yaml
conda_env                          pyasn1                                        scikit_learn-0.18.1-py3.6.egg-info
cryptography                       pyasn1-0.2.3-py3.6.egg-info                   scipy
cryptography-1.8.1-py3.6.egg-info  __pycache__                                   scipy-0.19.0-py3.6.egg-info
cycler-0.10.0-py3.6.egg-info       pycosat-0.6.2-py3.6.egg-info                  setuptools-27.2.0-py3.6.egg
cycler.py                          pycosat.cpython-36m-powerpc64le-linux-gnu.so  setuptools.pth
dateutil                           pycparser                                     six-1.10.0-py3.6.egg-info
easy-install.pth                   pycparser-2.17-py3.6.egg-info                 six.py
idna                               pylab.py                                      sklearn
idna-2.5-py3.6.egg-info            pyOpenSSL-17.0.0-py3.6.egg-info               test_pycosat.py
matplotlib                         pyparsing-2.1.4-py3.6.egg-info                wheel
matplotlib-2.0.2-py3.6.egg-info    pyparsing.py                                  wheel-0.29.0-py3.6.egg-info


따라서 이것들을 사용하기 위해서는 PYTHONPATH는 다음과 같이 설정하시면 됩니다.

u0017496@sys-87250:~$ export PYTHONPATH=/home/u0017496/miniconda3/lib/python3.6/site-packages:$PYTHONPATH


이제 여기에 (PowerAI에 포함된 tensorflow 말고) conda로 bazel, tensorflow 및 tensorflow-gpu도 설치해보겠습니다.

u0017496@sys-87250:~$ conda install bazel
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda3:

The following NEW packages will be INSTALLED:

    bazel: 0.4.5-0

Proceed ([y]/n)? y

bazel-0.4.5-0. 100% |#############################################| Time: 0:00:09  13.37 MB/s

u0017496@sys-87250:~$ conda install tensorflow
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda3:

The following NEW packages will be INSTALLED:

    libprotobuf: 3.2.0-0
    protobuf:    3.2.0-py36_0
    tensorflow:  1.1.0-np112py36_0
    werkzeug:    0.12.2-py36_0

Proceed ([y]/n)? y

libprotobuf-3. 100% |#############################################| Time: 0:00:00  13.84 MB/s
werkzeug-0.12. 100% |#############################################| Time: 0:00:00  18.67 MB/s
protobuf-3.2.0 100% |#############################################| Time: 0:00:00  10.39 MB/s
tensorflow-1.1 100% |#############################################| Time: 0:00:01  15.16 MB/s

u0017496@sys-87250:~$ conda install tensorflow-gpu
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda3:

The following NEW packages will be INSTALLED:

    cudatoolkit:    8.0-0
    cudnn:          6.0.21-0
    tensorflow-gpu: 1.1.0-np112py36_0

Proceed ([y]/n)? y

cudatoolkit-8. 100% |#############################################| Time: 0:00:29  11.24 MB/s
cudnn-6.0.21-0 100% |#############################################| Time: 0:00:11  15.97 MB/s
tensorflow-gpu 100% |#############################################| Time: 0:00:06  14.27 MB/s


conda list 명령으로 보면 다음과 같은 것들이 설치된 것을 보실 수 있습니다.

u0017496@sys-87250:~$ conda list
# packages in environment at /home/u0017496/miniconda3:
#
asn1crypto                0.22.0                   py36_0
bazel                     0.4.5                         0
boto                      2.46.1                   py36_0
bz2file                   0.98                     py36_0
cffi                      1.10.0                   py36_0
conda                     4.3.18                   py36_0
conda-env                 2.6.0                         0
cryptography              1.8.1                    py36_0
cudatoolkit               8.0                           0
cudnn                     6.0.21                        0
cycler                    0.10.0                   py36_0
freetype                  2.5.5                         2
gensim                    1.0.1               np112py36_0
gensim                    2.0.0                     <pip>
idna                      2.5                      py36_0
Keras                     2.0.4                     <pip>
konlpy                    0.4.4                     <pip>
libffi                    3.2.1                         1
libpng                    1.6.27                        0
libprotobuf               3.2.0                         0
matplotlib                2.0.2               np112py36_0
numpy                     1.12.1                    <pip>
numpy                     1.12.1                   py36_0
openblas                  0.2.19                        0
openssl                   1.0.2k                        2
packaging                 16.8                     py36_0
pip                       9.0.1                    py36_1
protobuf                  3.2.0                    py36_0
pyasn1                    0.2.3                    py36_0
pycosat                   0.6.2                    py36_0
pycparser                 2.17                     py36_0
pyopenssl                 17.0.0                   py36_0
pyparsing                 2.1.4                    py36_0
python                    3.6.1                         2
python-dateutil           2.6.0                    py36_0
pytz                      2017.2                   py36_0
PyYAML                    3.12                      <pip>
requests                  2.14.2                   py36_0
ruamel_yaml               0.11.14                  py36_1
scikit-learn              0.18.1              np112py36_1
scipy                     0.19.0              np112py36_0
setuptools                27.2.0                   py36_0
six                       1.10.0                   py36_0
smart_open                1.5.2                    py36_0
sqlite                    3.13.0                        0
tensorflow                1.1.0               np112py36_0
tensorflow-gpu            1.1.0               np112py36_0
Theano                    0.9.0                     <pip>
werkzeug                  0.12.2                   py36_0
wheel                     0.29.0                   py36_0
xz                        5.2.2                         1
yaml                      0.1.6                         0
zlib                      1.2.8                         3




설치하는 김에, 이렇게 conda로 설치한 tensorflow를 이용하여 inception v3 model을 training 해보겠습니다.   다음 순서대로 따라 하시면 됩니다.


u0017496@sys-87250:~/inception$ pwd
/home/u0017496/inception

u0017496@sys-87250:~/inception$ export INCEPTION_DIR=/home/u0017496/inception

u0017496@sys-87250:~/inception$ curl -O http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  380M  100  380M    0     0  5918k      0  0:01:05  0:01:05 --:--:-- 4233k

u0017496@sys-87250:~/inception$ tar -xvf inception-v3-2016-03-01.tar.gz
inception-v3/
inception-v3/checkpoint
inception-v3/README.txt
inception-v3/model.ckpt-157585

u0017496@sys-87250:~/inception$ git clone https://github.com/tensorflow/models.git
Cloning into 'models'...
remote: Counting objects: 4703, done.
remote: Compressing objects: 100% (43/43), done.
remote: Total 4703 (delta 17), reused 31 (delta 11), pack-reused 4649
Receiving objects: 100% (4703/4703), 153.34 MiB | 5.62 MiB/s, done.
Resolving deltas: 100% (2374/2374), done.
Checking connectivity... done.

u0017496@sys-87250:~/inception/models/inception$ export FLOWERS_DIR=/home/u0017496/inception/models/inception

u0017496@sys-87250:~/inception/models/inception$ mkdir -p $FLOWERS_DIR/data

u0017496@sys-87250:~/inception/models/inception$ which bazel
/home/u0017496/miniconda3/bin/bazel

u0017496@sys-87250:~/inception/models/inception$ bazel build inception/download_and_preprocess_flowers
Extracting Bazel installation...
....................
INFO: Found 1 target...
Target //inception:download_and_preprocess_flowers up-to-date:
  bazel-bin/inception/download_and_preprocess_flowers
INFO: Elapsed time: 6.943s, Critical Path: 0.05s

u0017496@sys-87250:~/inception/models/inception$ export TEST_TMPDIR=/home/u0017496/.cache

u0017496@sys-87250:~/inception/models/inception$ bazel build inception/download_and_preprocess_flowers
INFO: $TEST_TMPDIR defined: output root default is '/home/u0017496/.cache'.
Extracting Bazel installation...
.............
INFO: Found 1 target...
Target //inception:download_and_preprocess_flowers up-to-date:
  bazel-bin/inception/download_and_preprocess_flowers
INFO: Elapsed time: 4.867s, Critical Path: 0.03s

u0017496@sys-87250:~/inception/models/inception$ bazel-bin/inception/download_and_preprocess_flowers $FLOWERS_DIR/data
Downloading flower data set.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  218M  100  218M    0     0  9372k      0  0:00:23  0:00:23 --:--:-- 10.1M
(중략)
Found 3170 JPEG files across 5 labels inside /home/u0017496/inception/models/inception/data/raw-data/train.
Launching 2 threads for spacings: [[0, 1585], [1585, 3170]]
2017-05-19 05:33:44.191446 [thread 0]: Processed 1000 of 1585 images in thread batch.
2017-05-19 05:33:44.213856 [thread 1]: Processed 1000 of 1585 images in thread batch.
2017-05-19 05:33:54.902070 [thread 1]: Wrote 1585 images to /home/u0017496/inception/models/inception/data/train-00001-of-00002
2017-05-19 05:33:54.902172 [thread 1]: Wrote 1585 images to 1585 shards.
2017-05-19 05:33:54.911283 [thread 0]: Wrote 1585 images to /home/u0017496/inception/models/inception/data/train-00000-of-00002
2017-05-19 05:33:54.911360 [thread 0]: Wrote 1585 images to 1585 shards.
2017-05-19 05:33:55.171141: Finished writing all 3170 images in data set.

아래에서 보시다시피 이 inception v3는 꽃 사진을 분류하는 neural network입니다.

u0017496@sys-87250:~/inception/models/inception$ du -sm data/raw-data/train/*
29      data/raw-data/train/daisy
44      data/raw-data/train/dandelion
1       data/raw-data/train/LICENSE.txt
33      data/raw-data/train/roses
47      data/raw-data/train/sunflowers
48      data/raw-data/train/tulips

u0017496@sys-87250:~/inception/models/inception$ bazel build inception/flowers_train
INFO: $TEST_TMPDIR defined: output root default is '/home/u0017496/.cache'.
............................
INFO: Found 1 target...
Target //inception:flowers_train up-to-date:
  bazel-bin/inception/flowers_train
INFO: Elapsed time: 6.502s, Critical Path: 0.03s

이제 비로소 inception v3의 training 준비가 끝났습니다.  이제 다음 명령으로 training을 시작합니다.

u0017496@sys-87250:~/inception/models/inception$ time bazel-bin/inception/flowers_train --train_dir=$FLOWERS_DIR/train --data_dir=$FLOWERS_DIR/data --pretrained_model_checkpoint_path=$INCEPTION_DIR/inception-v3/model.ckpt-157585 --fine_tune=True --initial_learning_rate=0.001 -input_queue_memory_factor=1 --max_steps=50 --num_gpus 1 --batch_size=32

NVIDIA: no NVIDIA devices found
2017-05-19 05:41:03.740213: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-19 05:41:03.740670: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:145] kernel driver does not appear to be running on this host (sys-87250): /proc/driver/nvidia/version does not exist
2017-05-19 05:41:51.947244: Pre-trained model restored from /home/u0017496/inception/inception-v3/model.ckpt-157585
2017-05-19 05:47:22.023602: step 0, loss = 2.79 (0.2 examples/sec; 182.713 sec/batch)
2017-05-19 06:05:58.942671: step 10, loss = 2.53 (0.4 examples/sec; 78.882 sec/batch)
2017-05-19 06:19:26.875533: step 20, loss = 2.40 (0.4 examples/sec; 82.410 sec/batch)
2017-05-19 06:33:10.333275: step 30, loss = 2.20 (0.4 examples/sec; 77.844 sec/batch)
2017-05-19 06:48:27.688993: step 40, loss = 2.24 (0.3 examples/sec; 96.148 sec/batch)

real    84m30.882s
user    135m20.864s
sys     2m30.832s


이제 와서 고백하지만 제가 설치 demo를 보여드린 이 서버는 사실 GPU가 달려 있지 않은 POWER8 서버입니다.  GPU가 없는 경우 CPU를 이용하게 되는데, 그런 경우 이 training의 완료는 보시다시피 매우, 매우 오래 걸립니다.    저 output을 보면 초당 example 0.4개 처리로 나옵니다만, P100을 이용하는 경우 (GPU 개수 및 batch size에 따라) 초당 50개~200개 단위로 처리가 됩니다.

아래는 전에 PowerAI를 설치한 Minsky 서버에서 수행했던 inception v3의 결과 log 일부입니다.

2017-05-16 03:48:46.352210: Pre-trained model restored from /gpfs/gpfs_gl4_16mb/b7p088za/inception-v3/model.ckpt-157585
2017-05-16 03:52:44.322381: step 0, loss = 2.72 (17.6 examples/sec; 21.830 sec/batch)
2017-05-16 03:55:29.550791: step 10, loss = 2.57 (213.6 examples/sec; 1.797 sec/batch)
2017-05-16 03:55:47.619990: step 20, loss = 2.35 (212.1 examples/sec; 1.810 sec/batch)
2017-05-16 03:56:05.953991: step 30, loss = 2.17 (206.6 examples/sec; 1.859 sec/batch)
2017-05-16 03:56:24.306742: step 40, loss = 1.98 (209.4 examples/sec; 1.834 sec/batch)
2017-05-16 03:56:42.490063: step 50, loss = 1.92 (217.8 examples/sec; 1.763 sec/batch)
2017-05-16 03:57:00.444537: step 60, loss = 1.67 (216.6 examples/sec; 1.773 sec/batch)
2017-05-16 03:57:18.366941: step 70, loss = 1.58 (212.7 examples/sec; 1.806 sec/batch)
2017-05-16 03:57:36.467837: step 80, loss = 1.55 (213.6 examples/sec; 1.798 sec/batch)

댓글 2개:

  1. 안녕하세요

    해당 자료를 기반으로 x86에서 테스트하여 Posting하였습니다. 혹시 원하지 않으시면 다른 내용으로 변경하겠사오니, 변경을 원하시면 말씀 부탁드립니다.
    https://sysnet4admin.blogspot.kr/2017/06/dmdl-x86-anaconda-tensorflow.html#.WTfFy2iLQuU

    감사합니다.

    답글삭제
  2. 작성자가 댓글을 삭제했습니다.

    답글삭제