2018년 1월 13일 토요일

AC922 Redhat python3 환경에서 tensorflow 1.4.1을 source로부터 빌드하기

먼저번 포스팅에서 보신 것처럼 AC922 Redhat 7.4 환경에서 tensorflow 1.4를 사용하기 위한 공식적인 방법은 IBM에서 AC922 구매 고객에게만 별도로 제공하는 Tensorflow Technical Preview를 이용하는 것입니다.  그러나 이것은 아직은 python2만 지원하므로, python3에서는 사용할 수 없습니다.  (2018 2Q에는 다 지원될 예정)

하지만 그래도 python3에서 tensorflow 1.4를 사용할 방법이 전혀 없는 것은 아닙니다.  직접 빌드하면 됩니다.

여기서는 Tensorflow Technical Preview에 포함된 bazel 0.5.4를 이용하면 됩니다.  먼저 다음과 같이 Anaconda3의 path를 설정한 뒤, 이어서 Tensorflow Technical Preview에 포함된 bazel 0.5.4의 PATH를 맨 앞으로 설정하면 됩니다.

[root@ac922 nvme]# export PATH="/opt/DL/bazel/bin:/opt/anaconda3/bin:$PATH"

이어서 필요한 protobuf 등과 기타 필요 파일셋을 설치합니다.

[root@ac922 ~]# conda install protobuf

[root@ac922 ~]# which protoc
/opt/anaconda3/bin/protoc

[root@ac922 ~]# export PROTOC=/opt/anaconda3/bin/protoc

[root@ac922 nvme]# yum install apr-util-devel.ppc64le ant cmake.ppc64le automake.noarch ftp libtool.ppc64le libtool-ltdl-devel.ppc64le apr-util-openssl.ppc64le openssl-devel.ppc64le  golang.ppc64le golang-bin.ppc64le


(옵션 :  Tensorflow Technical Preview에 포함된 bazel 0.5.4를 사용하는 대신 다음과 같이 bazel 최신 버전의 bazel-*-dist.zip을 download 받아서 빌드를 해도 됩니다.

[root@ac922 nvme]# wget https://github.com/bazelbuild/bazel/releases/download/0.8.1/bazel-0.8.1-dist.zip

[root@ac922 nvme]# mkdir bazel-0.8.1 && cd bazel-0.8.1

[root@ac922 bazel-0.8.1]# unzip ../bazel-0.8.1-dist.zip

[root@ac922 bazel-0.8.1]# ./compile.sh

[root@ac922 bazel-0.8.1]# cp output/bazel /usr/local/bin

옵션 부분 끝)


이제 tensorflow source를 다운로드 받습니다.

[root@ac922 nvme]# git clone https://github.com/tensorflow/tensorflow

[root@ac922 nvme]# cd tensorflow

[root@ac922 tensorflow]# git checkout tags/v1.4.1

[root@ac922 tensorflow]# conda install wheel numpy six

[root@ac922 tensorflow]# export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64:/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:$LD_LIBRARY_PATH

[root@ac922 tensorflow]# export PATH=/opt/DL/bazel/bin:$PATH

다음으로는 평범하게 ./configure 뒤에 bazel build를 하면 되는데... 그러면 다음과 같이 boringssl 관련 error가 납니다.

[root@ac922 tensorflow]# bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
....
ERROR: /root/.cache/bazel/_bazel_root/c33b26ecf6ca982d66935dcfbfc79c56/external/boringssl/BUILD:118:1: C++ compilation of rule '@boringssl//:crypto' failed (Exit 1).
In file included from external/boringssl/src/crypto/fipsmodule/bcm.c:92:0:
external/boringssl/src/crypto/fipsmodule/sha/sha1.c:125:6: error: static declaration of 'sha1_block_data_order' follows non-static declaration
 void sha1_block_data_order(uint32_t *state, const uint8_t *data, size_t num);
      ^
In file included from external/boringssl/src/crypto/fipsmodule/bcm.c:91:0:
external/boringssl/src/crypto/fipsmodule/sha/sha1-altivec.c:190:6: note: previous definition of 'sha1_block_data_order' was here
 void sha1_block_data_order(uint32_t *state, const uint8_t *data, size_t num) {
      ^
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 77.133s, Critical Path: 20.67s


이 문제의 해결을 위해서는 다음의 patch 2개가 필요합니다.   patch 내용은 맨 아래에 별도로 달아두겠습니다.

[root@ac922 tensorflow]# patch < 120-curl-build-fix.patch
can't find file to patch at input line 5
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/third_party/curl.BUILD b/third_party/curl.BUILD
|index 882967d..3c48dfa 100644
|--- a/third_party/curl.BUILD
|+++ b/third_party/curl.BUILD
--------------------------
File to patch: third_party/curl.BUILD
patching file third_party/curl.BUILD
[root@ac922 tensorflow]# patch < 140-boring-ssl.patch
can't find file to patch at input line 5
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/third_party/boringssl/add_boringssl_s390x.patch b/third_party/boringssl/add_boringssl_s390x.patch
|index 8b42d10..26c51a3 100644
|--- a/third_party/boringssl/add_boringssl_s390x.patch
|+++ b/third_party/boringssl/add_boringssl_s390x.patch
--------------------------
File to patch: third_party/boringssl/add_boringssl_s390x.patch
patching file third_party/boringssl/add_boringssl_s390x.patch


이 patch들을 적용하고도 "fatal error: math_functions.hpp: No such file or directory"가 발생합니다.  이는 아래 URL을 참조하여 tensorflow/workspace.bzl을 다음과 같이 수정하면 됩니다.

# from https://github.com/tensorflow/tensorflow/issues/15389 & https://github.com/angersson/tensorflow/commit/599dc70e9e478b4bc24fb2329c175ea978ef620a

[root@ac922 tensorflow]# vi tensorflow/workspace.bzl
...
  native.new_http_archive(
      name = "eigen_archive",
      urls = [
#          "https://bitbucket.org/eigen/eigen/get/429aa5254200.tar.gz",
#          "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/429aa5254200.tar.gz",
          "https://bitbucket.org/eigen/eigen/get/034b6c3e1017.tar.gz",
          "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/034b6c3e1017.tar.gz",
      ],
#      sha256 = "61d8b6fc4279dd1dda986fb1677d15e3d641c07a3ea5abe255790b1f0c0c14e9",
#      strip_prefix = "eigen-eigen-429aa5254200",
      sha256 = "0a8ac1e83ef9c26c0e362bd7968650b710ce54e2d883f0df84e5e45a3abe842a",
      strip_prefix = "eigen-eigen-034b6c3e1017",
      build_file = str(Label("//third_party:eigen.BUILD")),
  )

이제 ./configure와 bazel build를 수행합니다.

[root@ac922 tensorflow]# ./configure
...
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
...
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
...
Do you wish to build TensorFlow with CUDA support? [y/N]: y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7
Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.1]:/usr/local/cuda-9.1/targets/ppc64le-linux/lib
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]7.0
...
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -mcpu=native]: -mcpu=power8
(아직 gcc에 -mcpu=power9이 없으므로 power8으로 대체해야 합니다.  아무 것도 주지않으면 default로 power9이 되면서 error가 납니다.)
...


[root@ac922 tensorflow]# bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

[root@ac922 tensorflow]# bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

결과로 생긴 tensorflow-1.4.1-cp36-cp36m-linux_ppc64le.whl를 pip로 설치하면 됩니다.

[root@ac922 tensorflow]# ls -l /tmp/tensorflow_pkg
total 67864
-rw-r--r--. 1 root root 69491907 Jan 13 12:23 tensorflow-1.4.1-cp36-cp36m-linux_ppc64le.whl

[root@ac922 tensorflow]# which pip
/opt/anaconda3/bin/pip

[root@ac922 tensorflow]# pip install /tmp/tensorflow_pkg/tensorflow-1.4.1-cp36-cp36m-linux_ppc64le.whl

[root@ac922 tensorflow]# conda list | grep tensor
tensorflow                1.4.1                     <pip>
tensorflow-tensorboard    0.4.0rc3                  <pip>

이제 python3에서 tensorflow 1.4.1을 사용하실 수 있게 되었습니다.


PS.  위에서 적용했던 patch 파일들(140-boring-ssl.patch & 120-curl-build-fix.patch) 내용입니다.

[root@ac922 tensorflow]# cat 140-boring-ssl.patch
diff --git a/third_party/boringssl/add_boringssl_s390x.patch b/third_party/boringssl/add_boringssl_s390x.patch
index 8b42d10..26c51a3 100644
--- a/third_party/boringssl/add_boringssl_s390x.patch
+++ b/third_party/boringssl/add_boringssl_s390x.patch
@@ -131,3 +131,19 @@ index 6b645e61..c90b7beb 100644
          "//conditions:default": ["-lpthread"],
      }),
      visibility = ["//visibility:public"],
+diff --git a/src/crypto/fipsmodule/sha/sha1.c b/src/crypto/fipsmodule/sha/sha1.c
+index 7ce0193..9791fa5 100644
+--- a/src/crypto/fipsmodule/sha/sha1.c
++++ b/src/crypto/fipsmodule/sha/sha1.c
+@@ -63,9 +63,9 @@
+ #include "../../internal.h"
+
+
+-#if !defined(OPENSSL_NO_ASM) &&                         \
++#if (!defined(OPENSSL_NO_ASM) &&                         \
+     (defined(OPENSSL_X86) || defined(OPENSSL_X86_64) || \
+-     defined(OPENSSL_ARM) || defined(OPENSSL_AARCH64) || \
++     defined(OPENSSL_ARM) || defined(OPENSSL_AARCH64)) || \
+      defined(OPENSSL_PPC64LE))
+ #define SHA1_ASM
+ #endif


[root@ac922 tensorflow]# cat 120-curl-build-fix.patch
diff --git a/third_party/curl.BUILD b/third_party/curl.BUILD
index 882967d..3c48dfa 100644
--- a/third_party/curl.BUILD
+++ b/third_party/curl.BUILD
@@ -479,7 +479,12 @@ genrule(
         "#  define HAVE_SSL_GET_SHUTDOWN 1",
         "#  define HAVE_STROPTS_H 1",
         "#  define HAVE_TERMIOS_H 1",
+        "#if defined(__powerpc64__) || defined(__powerpc__)",
+        "#  define OS \"powerpc64le-ibm-linux-gnu\"",
+        "#  undef HAVE_STROPTS_H",
+        "#else",
         "#  define OS \"x86_64-pc-linux-gnu\"",
+        "#endif",
         "#  define RANDOM_FILE \"/dev/urandom\"",
         "#  define USE_OPENSSL 1",
         "#endif",


아울러 위 과정에서 빌드한 tensorflow 1.4.1 for python3의 wheel 파일을 아래 google drive에 올려두겠습니다.  품질을 책임질 수 있는 파일이 아닌 점은 양해부탁드립니다.

https://drive.google.com/open?id=1_C2BZJ9G6HekxV2U6mil2sVf3WJIlh-n

댓글 없음:

댓글 쓰기