P100을 장착한 Ubuntu 16.04 환경의 Minsky 서버에는 원래 CUDA 8.0과 PowerAI에 딸린 tensorflow 1.0(python2)을 주로 썼습니다. 물론 Minsky 서버에도 Ubuntu 16.04를 그대로 유지한 채 CUDA 9.1을 설치한 뒤 tensorflow 1.5.1을 설치하여 쓸 수 있습니다. 여기서는 python3 환경입니다.
u0017649@sys-93315:~$ sudo apt-get install libaprutil1-dev ant cmake automake libtool-bin openssl libcurl4-openssl-dev
전에는 bazel-0.8.1을 썼습니다만, 요즘의 openjdk 1.8.0_151 환경에서는 이 bazel 버전은 다음과 같은 error를 냅니다.
ERROR: /home/minsky/files/bazel-0.8.1/src/main/java/com/google/devtools/common/options/BUILD:27:1: Building src/main/java/com/google/devtools/common/options/liboptions_internal.jar (35 source files) failed (Exit 1): java failed: error executing command
(cd /tmp/bazel_vQeTUQIe/out/execroot/io_bazel && \
exec env - \
LC_CTYPE=en_US.UTF-8 \
external/local_jdk/bin/java -XX:+TieredCompilation '-XX:TieredStopAtLevel=1' -Xbootclasspath/p:third_party/java/jdk/langtools/javac-9-dev-r4023-3.jar -jar bazel-out/host/bin/src/java_tools/buildjar/java/com/google/devtools/build/buildjar/bootstrap_deploy.jar @bazel-out/ppc-opt/bin/src/main/java/com/google/devtools/common/options/liboptions_internal.jar-2.params)
java.lang.InternalError: Cannot find requested resource bundle for locale en_US
이 error는 bazel-0.10.0 버전을 쓰면 없어집니다.
u0017649@sys-93315:~$ wget https://github.com/bazelbuild/bazel/releases/download/0.10.0/bazel-0.10.0-dist.zip
u0017649@sys-93315:~$ which python
/home/u0017649/anaconda3/bin/python
u0017649@sys-93315:~$ conda install protobuf
u0017649@sys-93315:~$ which protoc
/home/u0017649/anaconda3/bin/protoc
u0017649@sys-93315:~$ export PROTOC=/home/u0017649/anaconda3/bin/protoc
u0017649@sys-93315:~$ mkdir bazel-0.10.0 && cd bazel-0.10.0
u0017649@sys-93315:~/bazel-0.10.0$ unzip ../bazel-0.10.0-dist.zip
u0017649@sys-93315:~/bazel-0.10.0$ ./compile.sh
u0017649@sys-93315:~/bazel-0.10.0$ sudo cp output/bazel /usr/local/bin
u0017649@sys-93315:~$ git clone https://github.com/tensorflow/tensorflow
u0017649@sys-93315:~$ cd tensorflow
u0017649@sys-93315:~/tensorflow$ git checkout tags/v1.5.1
u0017649@sys-93315:~/tensorflow$ vi configure.py
...
# default_cc_opt_flags = '-mcpu=native'
default_cc_opt_flags = '-mcpu=power8'
else:
# default_cc_opt_flags = '-march=native'
default_cc_opt_flags = '-mcpu=power8'
...
# write_to_bazelrc('build:opt --host_copt=-march=native')
write_to_bazelrc('build:opt --host_copt=-mcpu=power8')
...
u0017649@sys-93315:~/tensorflow$ ./configure
...
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
...
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
...
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
...
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7
...
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/local/cuda/lib64
...
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.0,7.0 #6.0은 P100, 7.0은 V100 입니다.
...
u0017649@sys-93315:~/tensorflow$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
u0017649@sys-93315:~/tensorflow$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
u0017649@sys-93315:~/tensorflow$ pip install ~/tensorflow_pkg/tensorflow-1.5.1-cp36-cp36m-linux_ppc64le.whl
확인은 다음과 같이 합니다.
u0017649@sys-93315:~$ python
Python 3.6.4 |Anaconda, Inc.| (default, Feb 11 2018, 08:19:13)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> sess=tf.Session()
아래에 그렇게 만들어진 python3용 tensorflow-1.5.1-cp36-cp36m-linux_ppc64le.whl를 올려놓았습니다. 물론 이건 Ubuntu 환경에서든 Redhat 환경에서든 다 쓰실 수 있습니다.
https://drive.google.com/open?id=1CHIM-dgr0KcMJlHcc_I0fJUqBNs2czvL
그리고 아래는 python2용 tensorflow-1.5.1-cp27-cp27mu-linux_ppc64le.whl 입니다.
https://drive.google.com/open?id=1cTZAsLwyozPNufoZfeKIIaIJ7snYGC0M
댓글 없음:
댓글 쓰기