2017년 7월 3일 월요일

Minsky 서버(ppc64le)에서 tensorflow serving 설치하기

Tensorflow serving은 tensorflow를 이용해 training 시킨 model을 운영 환경으로 편리하게 deploy시켜주기 위한 SW platform입니다.   이를 build하기 위한 절차는 https://tensorflow.github.io/serving/setup 에 잘 나와 있습니다만, 여기서는 Minsky 서버의 아키텍처인 ppc64le, 즉 IBM POWER8 환경에서도 사용이 가능한지 검증을 해보겠습니다.

결론부터 이야기하면 잘 됩니다.

먼저, 관련된 Ubuntu OS package들을 설치합니다.

u0017496@sys-88049:~$ sudo apt-get install -y  build-essential curl libcurl3-dev git libfreetype6-dev  libpng12-dev  libzmq3-dev  pkg-config  python-dev  python-numpy  python-pip software-properties-common  swig   zip  zlib1g-dev  openjdk-8-jdk openjdk-8-jdk-headless

이어서, 앞선 posting( http://hwengineer.blogspot.kr/2017/05/minsky-continuum-anaconda.html )을 참조하여 ppc64le Ubuntu에 mini-conda를 설치합니다.  그러고 난 뒤에 아래와 같이 tensorflow, tensorflow-gpu, bazel, cudatoolkit, cudnn 6.0 등의 conda package를 설치합니다.

u0017496@sys-88049:~$ which conda
/home/u0017496/miniconda2/bin/conda

u0017496@sys-88049:~$ conda install tensorflow tensorflow-gpu
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda2:

The following NEW packages will be INSTALLED:

    cudatoolkit:    8.0-0
    cudnn:          6.0.21-0
    funcsigs:       1.0.2-py27_0
    libprotobuf:    3.2.0-0
    mock:           2.0.0-py27_0
    numpy:          1.12.1-py27_0
    openblas:       0.2.19-0
    pbr:            1.10.0-py27_0
    protobuf:       3.2.0-py27_0
    tensorflow:     1.1.0-np112py27_0
    tensorflow-gpu: 1.1.0-np112py27_0
    werkzeug:       0.12.2-py27_0

The following packages will be UPDATED:

    conda:          4.3.14-py27_0     --> 4.3.18-py27_0

Proceed ([y]/n)? y

cudatoolkit-8. 100% |####################################| Time: 0:00:26  12.45 MB/s
cudnn-6.0.21-0 100% |####################################| Time: 0:00:14  12.99 MB/s
openblas-0.2.1 100% |####################################| Time: 0:00:00  12.72 MB/s
libprotobuf-3. 100% |####################################| Time: 0:00:00  14.28 MB/s
funcsigs-1.0.2 100% |####################################| Time: 0:00:00  22.66 MB/s
numpy-1.12.1-p 100% |####################################| Time: 0:00:00  11.26 MB/s
werkzeug-0.12. 100% |####################################| Time: 0:00:00   9.26 MB/s
protobuf-3.2.0 100% |####################################| Time: 0:00:00   6.76 MB/s
pbr-1.10.0-py2 100% |####################################| Time: 0:00:00   6.81 MB/s
mock-2.0.0-py2 100% |####################################| Time: 0:00:00  24.93 MB/s
conda-4.3.18-p 100% |####################################| Time: 0:00:00   7.88 MB/s
tensorflow-1.1 100% |####################################| Time: 0:00:01  12.84 MB/s
tensorflow-gpu 100% |####################################| Time: 0:00:07  12.54 MB/s

u0017496@sys-88049:~$ conda install bazel curl pkg-config
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/u0017496/miniconda2:

The following NEW packages will be INSTALLED:

    bazel:      0.4.5-0
    curl:       7.52.1-0
    pkg-config: 0.28-1

Proceed ([y]/n)? y

bazel-0.4.5-0. 100% |####################################| Time: 0:00:08  16.43 MB/s
curl-7.52.1-0. 100% |####################################| Time: 0:00:00   9.12 MB/s
pkg-config-0.2 100% |####################################| Time: 0:00:00  16.10 MB/s


u0017496@sys-88049:~$ which bazel
/home/u0017496/miniconda2/bin/bazel

u0017496@sys-88049:~$ which pip
/home/u0017496/miniconda2/bin/pip


Tensorflow serving을 위해서는 먼저 grpcio package를 pip 명령으로 설치해야 합니다.  이때 google protobuf도 자동으로 함께 설치됩니다.

u0017496@sys-88049:~$ pip install grpcio
Collecting grpcio
  Downloading grpcio-1.4.0.tar.gz (9.1MB)
    100% |████████████████████████████████| 9.1MB 131kB/s
Requirement already satisfied: six>=1.5.2 in ./miniconda2/lib/python2.7/site-packages (from grpcio)
Collecting protobuf>=3.3.0 (from grpcio)
  Downloading protobuf-3.3.0.tar.gz (271kB)
    100% |████████████████████████████████| 276kB 2.4MB/s
Collecting futures>=2.2.0 (from grpcio)
  Downloading futures-3.1.1-py2-none-any.whl
Requirement already satisfied: enum34>=1.0.4 in ./miniconda2/lib/python2.7/site-packages (from grpcio)
Requirement already satisfied: setuptools in ./miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg (from protobuf>=3.3.0->grpcio)
Building wheels for collected packages: grpcio, protobuf
  Running setup.py bdist_wheel for grpcio ... /
...
Successfully built grpcio protobuf
Installing collected packages: protobuf, futures, grpcio
  Found existing installation: protobuf 3.2.0
    Uninstalling protobuf-3.2.0:
      Successfully uninstalled protobuf-3.2.0
Successfully installed futures-3.1.1 grpcio-1.4.0 protobuf-3.3.0

이제 github에서 tensorflow serving의 source를 다운로드 받습니다.

u0017496@sys-88049:~$ git clone --recurse-submodules https://github.com/tensorflow/serving

그중 먼저 tensorflow directory에 들어가 configure를 수행합니다.  이때 대부분의 질문에는 default로 enter만 누르면 됩니다만, CUDA support와 함께 build하겠느냐는 것과, cudnn library의 위치는 아래와 같이 직접 입력하셔야 합니다.  cudnn library의 위치는 OS에서 설치한 것 말고, conda install로 설치한 것의 directory를 입력하십시요.

u0017496@sys-88049:~$ cd serving/tensorflow

u0017496@sys-88049:~/serving/tensorflow$ ./configure
Extracting Bazel installation...
......................
You have bazel 0.4.5- installed.
Please specify the location of python. [Default is /home/u0017496/miniconda2/bin/python]:
Found possible Python library paths:
  /home/u0017496/miniconda2/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/home/u0017496/miniconda2/lib/python2.7/site-packages]

Using python library path: /home/u0017496/miniconda2/lib/python2.7/site-packages
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N]
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]:
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]:
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /home/u0017496/miniconda2/pkgs/cudnn-6.0.21-0/lib
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]:
Do you wish to build TensorFlow with MPI support? [y/N]
MPI support will not be enabled for TensorFlow
Configuration finished

이제 다시 한칸 위의 directory로 올라가 bazel build를 수행하면 됩니다.  이 과정은 상당히 오래 걸리고, 또 disk 공간도 약 700MB 정도 차지합니다.   특히 여기서는 Power Development Platform이라는 POWER8 기반의 cloud 환경에서 가상화된 CPU 1개를 이용했기 때문에 거의 2시간 가까이 걸렸습니다.

u0017496@sys-88049:~/serving/tensorflow$ cd ..

u0017496@sys-88049:~/serving$ bazel build tensorflow_serving/...
..............
WARNING: /home/u0017496/.cache/bazel/_bazel_u0017496/71c9acd4ac5e4078b5a9612ad32f9c09/external/org_tensorflow/third_party/py/python_configure.bzl:31:3: Python Configuration Warning: 'PYTHON_LIB_PATH' environment variable is not set, using '/usr/local/lib/python2.7/dist-packages' as default.
WARNING: /home/u0017496/serving/tensorflow_serving/servables/tensorflow/BUILD:525:1: in cc_library rule //tensorflow_serving/servables/tensorflow:classifier: target '//tensorflow_serving/servables/tensorflow:classifier' depends on deprecated target '@org_tensorflow//tensorflow/contrib/session_bundle:session_bundle': Use SavedModel Loader instead.
WARNING: /home/u0017496/serving/tensorflow_serving/servables/tensorflow/BUILD:525:1: in cc_library rule //tensorflow_serving/servables/tensorflow:classifier: target '//tensorflow_serving/servables/tensorflow:classifier' depends on deprecated target '@org_tensorflow//tensorflow/contrib/session_bundle:signature': Use SavedModel instead.
..............
[1,125 / 4,270] Compiling external/protobuf/src/google/protobuf/compiler/cpp/cpp_m\
essage_field.cc
...
INFO: From Compiling external/org_tensorflow/tensorflow/tools/tfprof/internal/print_model_analysis.cc:
In file included from external/org_tensorflow/tensorflow/tools/tfprof/internal/tfprof_stats.h:41:0,
                 from external/org_tensorflow/tensorflow/tools/tfprof/internal/advisor/checker.h:20,
                 from external/org_tensorflow/tensorflow/tools/tfprof/internal/advisor/accelerator_utilization_checker.h:19,
                 from external/org_tensorflow/tensorflow/tools/tfprof/internal/advisor/tfprof_advisor.h:19,
                 from external/org_tensorflow/tensorflow/tools/tfprof/internal/print_model_analysis.cc:26:
external/org_tensorflow/tensorflow/tools/tfprof/internal/tfprof_op.h: In member function 'virtual bool tensorflow::tfprof::TFOp::ShouldShowIfExtra(tensorflow::tfprof::ShowMultiNode*, const tensorflow::tfprof::Options&, int)':
external/org_tensorflow/tensorflow/tools/tfprof/internal/tfprof_op.h:61:29: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     if (opts.min_occurrence > node->node->graph_nodes().size()) {
                             ^
INFO: Elapsed time: 6744.345s, Critical Path: 5770.33s

위의 bazel build 결과 아래와 같은 binary file이 생성되고, 수행해보면 아래와 같은 usage가 display 됩니다.

u0017496@sys-88049:~/serving$ file bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, interpreter /lib64/ld64.so.2, for GNU/Linux 3.2.0, BuildID[md5/uuid]=a9e44548b6dbbe05d18b5711e5e29e1a, not stripped


u0017496@sys-88049:~/serving$ bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
usage: bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server
Flags:
        --port=8500                             int32   port to listen on
        --enable_batching=false                 bool    enable batching
        --batching_parameters_file=""           string  If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
        --model_config_file=""                  string  If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. (If used, --model_name, --model_base_path and --model_version_policy are ignored.)
        --model_name="default"                  string  name of model (ignored if --model_config_file flag is set
        --model_base_path=""                    string  path to export (ignored if --model_config_file flag is set, otherwise required)
        --model_version_policy="LATEST_VERSION" string  The version policy which determines the number of model versions to be served at the same time. The default value is LATEST_VERSION, which will serve only the latest version. See file_system_storage_path_source.proto for the list of possible VersionPolicy. (Ignored if --model_config_file flag is set)
        --file_system_poll_wait_seconds=1       int32   interval in seconds between each poll of the file system for new model version
        --use_saved_model=true                  bool    If true, use SavedModel in the server; otherwise, use SessionBundle. It is used by tensorflow serving team to control the rollout of SavedModel and is not expected to be set by users directly.
        --tensorflow_session_parallelism=0      int64   Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
        --platform_config_file=""               string  If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching and --use_saved_model are ignored.)

댓글 없음:

댓글 쓰기