http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/UsingDriverlessAI.pdf
먼저 DriverlessAI의 rpm package를 다음과 같이 download 받습니다.
[root@ING data]# wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.2.2-6/ppc64le-centos7/dai-1.2.2-1.ppc64le.rpm
rpm 명령으로 설치합니다.
[root@ING data]# rpm -Uvh dai-1.2.2-1.ppc64le.rpm
Preparing... ################################# [100%]
Updating / installing...
1:dai-1.2.2-1 ################################# [100%]
User configuration file /etc/dai/User.conf already exists.
Group configuration file /etc/dai/Group.conf already exists.
Configured user in /etc/dai/User.conf is 'dai'.
Configured group in /etc/dai/Group.conf is 'dai'.
Group 'dai' already exists.
User 'dai' already exists.
Adding systemd configuration files in /etc/systemd/system...
Created symlink from /etc/systemd/system/dai.service.wants/dai-dai.service to /usr/lib/systemd/system/dai-dai.service.
Created symlink from /etc/systemd/system/dai.service.wants/dai-h2o.service to /usr/lib/systemd/system/dai-h2o.service.
Created symlink from /etc/systemd/system/dai.service.wants/dai-procsy.service to /usr/lib/systemd/system/dai-procsy.service.
Calling 'systemctl enable dai'...
Created symlink from /etc/systemd/system/multi-user.target.wants/dai.service to /usr/lib/systemd/system/dai.service.
Installation complete.
DAI의 구동도 매우 간단합니다. 아래와 같이 dai.service만 start 해주면 main process와 보조 h2o process, proxy process인 dai-procsy까지 모두 자동으로 뜹니다.
[root@ING ~]# systemctl start dai
[root@ING ~]# systemctl status dai-dai
● dai-dai.service - Driverless AI (Main Application Process)
Loaded: loaded (/usr/lib/systemd/system/dai-dai.service; enabled; vendor preset: disabled)
[root@ING ~]# systemctl status dai-h2o
● dai-h2o.service - Driverless AI (H2O Process)
Loaded: loaded (/usr/lib/systemd/system/dai-h2o.service; enabled; vendor preset: disabled)
[root@ING ~]# systemctl status dai-procsy
● dai-procsy.service - Driverless AI (Procsy Process)
Loaded: loaded (/usr/lib/systemd/system/dai-procsy.service; enabled; vendor preset: disabled)
그리고 test1이라는 user를 만들고, 그 user를 DAI의 기본 user/group인 dai라는 group에 포함시킵니다.
[root@ING ~]# usermod -a -G dai test1
[root@ING ~]# cat /etc/group | grep dai
dai:x:980:test1
다음과 같이 t1.py와 t1.R을 준비합니다. 이것들은 python과 R에서 h2o4gpu를 사용할 수 있는지 확인하는 python 및 R script입니다.
[test1@ING ~]$ cat t1.py
import h2o4gpu
import numpy as np
X = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
model.cluster_centers_
[test1@ING ~]$ cat t1.R
library(reticulate)
library(h2o4gpu)
use_python("/opt/h2oai/dai/python/bin/python")
x <- iris[1:4]
y <- as.integer(iris$Species)
model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)
pred <- model %>% predict(x)
library(Metrics)
ce(actual = y, predicted = pred)
이제 test1에서 DAI에 포함된 h2o4gpu를 사용하는 방법입니다. 한줄 요약하면, PATH 등 환경변수를 설정하여 DAI에서 제공하는 python 및 PYTHONPATH를 사용하기만 하면 됩니다.
[test1@ING ~]$ export PATH=/opt/h2oai/dai/python/bin:$PATH
[test1@ING ~]$ export LD_LIBRARY_PATH=/opt/h2oai/dai/python/lib:/opt/h2oai/dai/lib:$LD_LIBRARY_PATH
[test1@ING ~]$ export PYTHONPATH=/opt/h2oai/dai/cuda-9.2/lib/python3.6/site-packages
[test1@ING ~]$ pip list | grep h2o
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
h2o (3.20.0.2)
h2o4gpu (0.2.0.9999+master.eb6295c)
h2oai (1.2.2)
h2oai-client (1.2.2)
h2oaicore (1.2.2)
DAI에서 제공하는 python은 3.6이며, 일반 anaconda에서 제공되는 것과 동일합니다. 제가 source로부터 build한 tensorflow 1.8도 pip로 정상적으로 설치해서 동일하게 사용할 수 있습니다.
[test1@ING ~]$ pip install /tmp/tensorflow-1.8.0-cp36-cp36m-linux_ppc64le.whl
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
tensorflow (1.8.0)
[test1@ING ~]$ which python
/opt/h2oai/dai/python/bin/python
이제 python에서 저 위의 t1.py를 수행하겠습니다. 여기서는 그냥 line by line으로 copy & paste 했습니다.
[test1@ING ~]$ python
Python 3.6.4 (default, Jun 30 2018, 13:42:46)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import h2o4gpu
>>> import numpy as np
>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])
>>> model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
>>> model.cluster_centers_
array([[1. , 0.5],
[1. , 4. ]])
>>>
다음으로 R에서 저 위의 t1.R을 수행하겠습니다. 여기서는 그냥 line by line으로 copy & paste 했습니다.
[test1@ING ~]$ R
R version 3.4.1 (2017-06-30) -- "Single Candle"
> library(reticulate)
y <- as.integer(iris$Species)
model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)
pred <- model %>% predict(x)
library(Metrics)
ce(actual = y, predicted = pred)> library(h2o4gpu)
Attaching package: ‘h2o4gpu’
The following object is masked from ‘package:base’:
transform
> use_python("/opt/h2oai/dai/python/bin/python")
> x <- iris[1:4]
> y <- as.integer(iris$Species)
> model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)
> pred <- model %>% predict(x)
/opt/h2oai/dai/python/lib/python3.6/site-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
if diff:
> library(Metrics)
> ce(actual = y, predicted = pred)
[1] 0.02666667
>
또한 tensorflow가 제대로 GPU를 물고 올라오는지 시험해보겠습니다. 물론 잘 됩니다.
[test1@ING ~]$ python
Python 3.6.4 (default, Jun 30 2018, 13:42:46)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
/opt/h2oai/dai/python/lib/python3.6/site-packages/h5py-2.7.1-py3.6-linux-ppc64le.egg/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
>>> sess=tf.Session()
2018-08-06 10:21:44.054091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:04:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-06 10:21:44.568612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:05:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-06 10:21:45.033482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 2 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:03:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-06 10:21:45.461878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 3 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:04:00.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2018-08-06 10:21:45.462104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1, 2, 3
2018-08-06 10:21:47.427824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-06 10:21:47.427960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1 2 3
2018-08-06 10:21:47.427989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N Y Y Y
2018-08-06 10:21:47.428012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: Y N Y Y
2018-08-06 10:21:47.428034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 2: Y Y N Y
2018-08-06 10:21:47.428055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 3: Y Y Y N
2018-08-06 10:21:47.431083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14857 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2018-08-06 10:21:47.987194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 14857 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2018-08-06 10:21:48.813286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 14856 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
2018-08-06 10:21:49.397252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 14861 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
>>>
댓글 없음:
댓글 쓰기