HW 엔지니어를 위한 Deep Learning: Tensorflow 1.8을 source에서 build하기 (CUDA 9.2, ppc64le, POWER9)

최근 IBM이 tensorflow 1.8을 ppc64le에서도 사용할 수 있다고 발표했습니다. 그 prerequisite은 CUDA 9.2가 설치되어 있는 환경이어야 한다는 점입니다. 아래 site에서 wheel file을 받아서 아래처럼 설치하면 간단히 쓸 수 있습니다.

[ibm@centos01 files]$ wget ftp://ftp.unicamp.br/pub/ppc64el/ai_frameworks/tensorflow/tensorflow-1.8.0-cp27-none-linux_ppc64le.whl

[ibm@centos01 files]$ which python
~/anaconda2/bin/python

[ibm@centos01 files]$ pip install tensorflow-1.8.0-cp27-none-linux_ppc64le.whl

그러나 위 site에는 python2.7을 위한 wheel file만 있습니다. python3.6을 위한 tensorflow가 필요하시다면 직접 source에서 build하셔야 합니다. 현재로서는 이걸 build하실 때 아래와 같이 약간 source 수정이 필요합니다.

먼저, tensorflow 1.8.0에서는 기존 버전과 눈에 띄게 달라진 점이 있습니다. NCCL ('니클'이라고 읽습니다) library를 사용한다는 점입니다. 따라서 NCCL을 먼저 설치해야 하는데, 다행히 쉽습니다.

[ibm@centos01 ~]$ git clone https://github.com/NVIDIA/nccl
[ibm@centos01 ~]$ cd nccl
[ibm@centos01 nccl]$ make
[ibm@centos01 nccl]$ sudo make install

python 버전이 3.x 인 것을 확인하십시요.

[ibm@centos01 ~]$ which python
~/anaconda3/bin/python

[ibm@centos01 ~]$ git clone https://github.com/tensorflow/tensorflow

[ibm@centos01 ~]$ cd tensorflow

[ibm@centos01 tensorflow]$ git checkout tags/v1.8.0

예전 버전과 마찬가지로, -march=native가 default로 들어가 있으면 error가 나므로 해당 부분을 다 -mcpu=power8으로 수정하십시요.

[ibm@centos01 tensorflow]$ vi ./configure.py
...
default_cc_opt_flags = '-mcpu=power8'
...
default_cc_opt_flags = '-mcpu=power8'
...
write_to_bazelrc('build:opt --host_copt=-mcpu=power8')
...

그리고 configure를 수행합니다. 아래 표시된 질문 외에는 다 default 값을 택하시면 됩니다.

[ibm@centos01 tensorflow]$ ./configure
...
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
...
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
...
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
...
Do you wish to build TensorFlow with CUDA support? [y/N]: y
...
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2
...
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/local/cuda-9.2
...
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.0,7.0
...

그리고나서 아래 issue #19291에 제시된 대로 third_party/png.BUILD 파일을 수정해주십시요.

https://github.com/tensorflow/tensorflow/pull/19291/commits/8f8a3c5151a674b3496691af49c3aa063841f292

[ibm@centos01 tensorflow]$ vi third_party/png.BUILD #31번째 줄 대신 아래를 삽입

# ],
] + select({
"@org_tensorflow//tensorflow:linux_ppc64le": [
"powerpc/powerpc_init.c",
"powerpc/filter_vsx_intrinsics.c",
],
"//conditions:default": [
],
}),

//* 이 수정을 해주지 않으면 아래와 같은 error를 만나시게 됩니다.

ERROR: /home/ibm/.cache/bazel/_bazel_ibm/4869dbc4f6d1a096d34c86242ee59bba/external/boringssl/BUILD:115:1: C++ compilation of rule '@boringssl//:crypto' failed (Exit 1)
external/boringssl/src/crypto/pkcs8/pkcs8.c: In function 'ascii_to_ucs2':
external/boringssl/src/crypto/pkcs8/pkcs8.c:86:3: error: 'for' loop initial declarations are only allowed in C99 mode
for (size_t i = 0; i < ulen - 2; i += 2) {
^
*//

이제 예전과 동일하세 bazel build 하시면 됩니다.

[ibm@centos01 tensorflow]$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
...
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 8178.924s, Critical Path: 252.09s
INFO: Build completed successfully, 11585 total actions

예전에 비해 build 시간이 굉장히 길어진 것을 느끼실 겁니다. 이제 whl file을 만듭니다.

[ibm@centos01 tensorflow]$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/files/tensorflow_pkg

만들어진 whl file을 보면 확실히 기존 TF1.5.1보다 size도 2배 가량 커졌습니다.

[ibm@centos01 tensorflow]$ ls -l ~/files/tensorflow_pkg
total 162328
-rw-rw-r--. 1 ibm ibm 55160703 May 9 15:17 tensorflow-1.5.1-cp36-cp36m-linux_ppc64le.whl
-rw-rw-r--. 1 ibm ibm 111062624 May 31 16:07 tensorflow-1.8.0-cp36-cp36m-linux_ppc64le.whl

이 whl file을 설치하시면 됩니다.

[ibm@centos01 tensorflow]$ pip install ~/files/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-linux_ppc64le.whl
...
Successfully uninstalled numpy-1.13.1
Successfully installed absl-py-0.2.2 astor-0.6.2 gast-0.2.0 grpcio-1.12.0 markdown-2.6.11 numpy-1.14.3 tensorboard-1.8.0 tensorflow-1.8.0 termcolor-1.1.0

[ibm@centos01 tensorflow]$ pip list | grep tensor
tensorboard 1.8.0
tensorflow 1.8.0

이 TF1.8.0으로 Neural Machine Translation의 training과 inference를 수행해보면, 동일한 HW에서 기존 TF1.5.1의 수행 결과보다 더 빨라진 것 같지는 않습니다.

여기서 build한 tensorflow 1.8.0의 wheel file을 아래 google drive에 올려놓겠습니다. 이것을 사용하시려면 CUDA 9.2가 설치되어 있어야 한다는 점을 유의하시기 바랍니다.

https://drive.google.com/open?id=1jEXy0A6gRSTzUG_ttwGYQivWoGlHWsU6

HW 엔지니어를 위한 Deep Learning

2018년 5월 31일 목요일

Tensorflow 1.8을 source에서 build하기 (CUDA 9.2, ppc64le, POWER9)

댓글 없음:

댓글 쓰기