2017년 3월 27일 월요일
IBM Minsky에서의 H2O Deep Learning 설치
H2O는 확장성이 좋고 Spark, Hadoop, R 등 big data platform과 연동하여 사용하기 좋은 machine learning platform입니다. 물론 open source입니다. H2O도 ppc64le 아키텍처, 즉 IBM Minsky (S822LC for HPC) 서버에서 사용 가능합니다. 여기서는 H2O를 R과 함께 Minsky 서버에 설치하는 방법을 정리했으며, 원본은 아래 link에 있는 내용을 참조했습니다.
http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/8/index.html#R
먼저, 아래 주소에서 H2O를 Minsky 서버로 download 합니다.
test@minsky:~/R$ wget http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/8/h2o-3.6.0.8.zip
test@minsky:~/R$ unzip h2o-3.10.4.2.zip
test@minsky:~/R$ cd h2o-3.10.4.2
test@minsky:~/R/h2o-3.10.4.2$ java -jar h2o.jar
03-27 17:04:32.866 172.18.229.117:54321 49868 main INFO: ----- H2O started -----
03-27 17:04:32.890 172.18.229.117:54321 49868 main INFO: Build git branch: rel-ueno
...
03-27 17:04:34.787 172.18.229.117:54321 49868 main INFO: Open H2O Flow in your web browser: http://172.18.229.117:54321
03-27 17:04:34.787 172.18.229.117:54321 49868 main INFO:
이렇게 H2O 서버를 구동한 뒤에, 아래와 같이 web browser에서 54321 port로 접속이 가능합니다.
R은 Ubuntu 16.04에 포함된 r-base-core package에서 나온 3.2.3을 써도 되고, 앞선 포스팅에서 설명한 대로 MS open R server 3.3.2를 직접 build해서 써도 됩니다.
test@minsky:~/R/h2o-3.10.4.2$ which R
/usr/local/lib/R/bin/R
먼저, R을 구동하여 R에 h2o 패키지가 설치되어 있는지 확인하여 만약 기존 것이 있으면 제거합니다.
test@minsky:~/R/h2o-3.10.4.2$ sudo R
> if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
> if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
이어서 methods, statmod, stats, RCurl 등의 R 패키지를 설치합니다.
> if (! ("methods" %in% rownames(installed.packages()))) { install.packages("methods") }
> if (! ("statmod" %in% rownames(installed.packages()))) { install.packages("statmod") }
> if (! ("stats" %in% rownames(installed.packages()))) { install.packages("stats") }
> if (! ("graphics" %in% rownames(installed.packages()))) { install.packages("graphics") }
> if (! ("RCurl" %in% rownames(installed.packages()))) { install.packages("RCurl") }
> if (! ("jsonlite" %in% rownames(installed.packages()))) { install.packages("jsonlite") }
> if (! ("tools" %in% rownames(installed.packages()))) { install.packages("tools") }
> if (! ("utils" %in% rownames(installed.packages()))) { install.packages("utils") }
이 과정에서 필요시 다음과 같이 internet에서 이 package들을 download 받아옵니다.
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
HTTPS CRAN mirror
...
49: USA (CA 1) [https] 50: USA (IA) [https]
51: USA (IN) [https] 52: USA (KS) [https]
53: USA (MI 1) [https] 54: USA (OR) [https]
55: USA (TN) [https] 56: USA (TX 1) [https]
57: USA (TX 2) [https] 58: (HTTP mirrors)
Selection: 49
trying URL 'https://cran.cnr.berkeley.edu/src/contrib/statmod_1.4.29.tar.gz'
Content type 'application/x-gzip' length 56932 bytes (55 KB)
==================================================
downloaded 55 KB
이 과정이 성공적으로 완료되면 이제 H2O 패키지를 설치합니다. 다만, 저 원본 URL에 나온 것과 같은 아래 tibshirani 버전은 H2O 버전과 맞지 않아 아래와 같은 error를 냅니다.
> install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/8/R")))
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'http://h2o-release.s3.amazonaws.com/h2o/rel-tibshirani/8/R/src/contrib/h2o_3.6.0.8.tar.gz'
Content type 'application/x-tar; charset=binary' length 46482663 bytes (44.3 MB)
==================================================
downloaded 44.3 MB
...
> localH2O = h2o.init(nthreads=-1)
...
Error in h2o.init(nthreads = -1) :
Version mismatch! H2O is running version 3.10.4.2 but R package is version 3.6.0.8
따라서 아래와 같이 tverberg 버전을 설치해야 합니다.
> install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/6/R")))
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'http://h2o-release.s3.amazonaws.com/h2o/rel-tverberg/6/R/src/contrib/h2o_3.10.3.6.tar.gz'
Content type 'application/x-tar' length 59595766 bytes (56.8 MB)
==================================================
downloaded 56.8 MB
...
설치가 완료되면 h2o를 load하고 h2o.init을 수행합니다.
> library(h2o)
----------------------------------------------------------------------
Your next step is to start H2O:
> h2o.init()
For H2O package documentation, ask for help:
> ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
Attaching package: ‘h2o’
....
> localH2O = h2o.init(nthreads=-1)
...
Starting H2O JVM and connecting: ... Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 3 seconds 392 milliseconds
H2O cluster version: 3.10.3.6
H2O cluster version age: 1 month and 6 days
H2O cluster name: H2O_started_from_R_root_uwo022
H2O cluster total nodes: 1
H2O cluster total memory: 0.89 GB
H2O cluster total cores: 128
H2O cluster allowed cores: 128
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
R Version: R version 3.2.3 (2015-12-10)
이어서 demo로 kmeans를 수행해봅니다. 사전에 tightvncserver 등을 구동시켜 plot이 되도록 해놓습니다.
> demo(h2o.kmeans)
demo(h2o.kmeans)
---- ~~~~~~~~~~
Type <Return> to start :
...
> prostate.hex = h2o.uploadFile(path = system.file("extdata", "prostate.csv", package="h2o"), destination_frame = "prostate")
|======================================================================| 100%
> summary(prostate.hex)
ID CAPSULE AGE RACE
Min. : 1.00 Min. :0.0000 Min. :43.00 Min. :0.000
1st Qu.: 95.75 1st Qu.:0.0000 1st Qu.:62.00 1st Qu.:1.000
Median :190.50 Median :0.0000 Median :67.00 Median :1.000
Mean :190.50 Mean :0.4026 Mean :66.04 Mean :1.087
3rd Qu.:285.25 3rd Qu.:1.0000 3rd Qu.:71.00 3rd Qu.:1.000
Max. :380.00 Max. :1.0000 Max. :79.00 Max. :2.000
DPROS DCAPS PSA VOL
Min. :1.000 Min. :1.000 Min. : 0.300 Min. : 0.00
1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 4.900 1st Qu.: 0.00
Median :2.000 Median :1.000 Median : 8.664 Median :14.20
Mean :2.271 Mean :1.108 Mean : 15.409 Mean :15.81
3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 17.063 3rd Qu.:26.40
Max. :4.000 Max. :2.000 Max. :139.700 Max. :97.60
GLEASON
Min. :0.000
1st Qu.:6.000
Median :6.000
Mean :6.384
3rd Qu.:7.000
Max. :9.000
....
Total Within SS: 537.6507
Between SS: 1357.349
Total SS: 1895
Centroid Statistics:
centroid size within_cluster_sum_of_squares
1 1 156.00000 165.18206
2 2 6.00000 10.15081
3 3 2.00000 5.89369
4 4 1.00000 0.00000
5 5 60.00000 114.01143
6 6 2.00000 0.04695
7 7 11.00000 18.54178
8 8 30.00000 82.95187
9 9 23.00000 35.24021
10 10 89.00000 105.63191
> plot(prostate.ctrs[,1:2])
Hit <Return> to see next plot:
> plot(prostate.ctrs[,3:4])
> title("K-Means Centers for k = 10", outer = TRUE, line = -2.0)
Warning message:
In summary.H2OFrame(prostate.hex) :
Approximated quantiles computed! If you are interested in exact quantiles, please pass the `exact_quantiles=TRUE` parameter.
결과로서 아래와 같은 그래프가 그려지는 것을 보실 수 있습니다.
피드 구독하기:
댓글 (Atom)
댓글 없음:
댓글 쓰기