HW 엔지니어를 위한 Deep Learning: POWER8

레이블이 POWER8인 게시물을 표시합니다. 모든 게시물 표시

2019년 1월 30일 수요일

IBM POWER8/9 (ppc64le) 환경에서의 Darknet 설치 및 test

Darknet은 C로 구현된 open source neural network famework입니다. IBM POWER8 또는 POWER9, 즉 ppc64le 환경에서도 당연히 잘 설치됩니다. 여기서는 CentOS 7 on POWER8에서 구현했습니다.

설치는 source를 받아서 다음과 같이 compile 하기만 하면 됩니다. Source를 수정할 필요 전혀 없습니다.

[bsyu@centos01 files]$ git clone https://github.com/pjreddie/darknet.git

[bsyu@centos01 files]$ cd darknet

[bsyu@centos01 darknet]$ make -j8

[bsyu@centos01 darknet]$ ./darknet
usage: ./darknet <function>

다만 혹시 GPU를 사용하고자 할 때는 아래와 같이 CUDA로 build되도록 Makefile을 살짝 수정해주시면 됩니다. 이 또한 ppc64le 환경에서도 build는 잘 됩니다만, 제가 가난하여 GPU가 없기 때문에 테스트는 이것으로 못했습니다.

[bsyu@centos01 darknet]$ vi Makefile
GPU=1 # default 0
CUDNN=1 # default 0
...

[bsyu@centos01 darknet]$ make -j8

[bsyu@centos01 darknet]$ ./darknet
usage: ./darknet <function>

CUDA 없이, 처음에 build한 CPU-only 버전으로도 darknet을 통해 YOLO를 테스트해볼 수 있습니다. 이 darknet 속에는 이미 YOLO를 위한 cfg 파일과 샘플용 멍멍이 사진이 들어있거든요.

[bsyu@centos01 darknet]$ ls -l cfg/yolov3.cfg data/dog.jpg
-rw-r--r-- 1 bsyu bsyu 8342 Jan 30 15:19 cfg/yolov3.cfg
-rw-r--r-- 1 bsyu bsyu 163759 Jan 30 15:19 data/dog.jpg

이 dog.jpg를 download 받아보면 아래와 같습니다.

이제 테스트를 위해, 미리 멍멍이를 감지하도록 YOLO v3로 pre-train된 weight file을 download 합니다.

[bsyu@centos01 darknet]$ wget https://pjreddie.com/media/files/yolov3.weights

이것으로 data/dog.jpg 사진 속의 object를 'detect' 하는 명령을 내려보겠습니다.

[bsyu@centos01 darknet]$ ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BFLOPs
1 conv 64 3 x 3 / 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BFLOPs
2 conv 32 1 x 1 / 1 304 x 304 x 64 -> 304 x 304 x 32 0.379 BFLOPs
3 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BFLOPs
4 res 1 304 x 304 x 64 -> 304 x 304 x 64
5 conv 128 3 x 3 / 2 304 x 304 x 64 -> 152 x 152 x 128 3.407 BFLOPs
6 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
...
104 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
105 conv 255 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 255 0.754 BFLOPs
106 yolo
Loading weights from yolov3.weights...Done!
data/dog.jpg: Predicted in 27.192135 seconds.
dog: 100%
truck: 92%
bicycle: 99%

우리가 OpenCV로 darknet을 build하지 않았기 때문에 직접 사진이 나오지는 않지만 darknet은 predictions.jpg로 그 결과를 저장합니다.

[bsyu@centos01 darknet]$ ls -ltr | tail -n 3
-rwxr-xr-x 1 bsyu bsyu 719872 Jan 30 15:29 libdarknet.so
-rwxr-xr-x 1 bsyu bsyu 841976 Jan 30 15:29 darknet
-rw-r--r-- 1 bsyu bsyu 119208 Jan 30 15:41 predictions.jpg

이 predictions.jpg을 PC로 download 받아보면 그 결과는 아래와 같습니다.

2019년 1월 25일 금요일

이공계 연구를 위한 H2O Driverless의 활용 - 분자 에너지 값의 예측

이번에는 화학이나 제조 공정 연구 등에 H2O DriverlessAI를 활용하는 가능성에 대해서 보도록 하겠습니다. 신물질 개발이나 기계적 특성 연구 등에는 다양한 성분 또는 온도, airflow 등의 다양한 조건들의 결합과 그에 따른 결과값 예측이 필요합니다. 그러나 비용과 시간의 문제 때문에 그 엄청난 수의 조합에 대해 모두 다 일일이 테스트를 해볼 수는 없지요. 공장이나 연구실의 각종 계측기를 통해 수집한 data가 어느 정도 축적되어 있다면, machine learning을 통해 기존 data를 분석하여 가장 좋은 결과값을 낼 성분 및 조건 등에 대한 조합을 미리 예측할 수 있습니다. 그를 통해 실제 테스트 회수를 크게 줄일 수 있으므로 비용 절감은 물론이고 더 빠른 개발도 가능합니다. 이때 그 machine learning이 빨리 이루어질 수록, 그리고 그 accuracy가 정확할 수록 그 효과는 커질 것입니다.

Kaggle에 올라온 public dataset 중에는 분자 및 그 내부의 원자 구조, 그리고 그에 따른 분자의 에너지값을 담은 json file들이 있습니다.

https://www.kaggle.com/burakhmmtgl/predict-molecular-properties/home

이 dataset은 다음과 같은 zip 형태로 download 받을 수 있는데, 이걸 unzip 해보면 10개의 json file들이 들어 있습니다.

[u0017649@sys-96775 files]$ unzip ./predict-molecular-properties.zip
Archive: ./predict-molecular-properties.zip
inflating: pubChem_p_00000001_00025000.json
inflating: pubChem_p_00025001_00050000.json
inflating: pubChem_p_00050001_00075000.json
inflating: pubChem_p_00075001_00100000.json
inflating: pubChem_p_00100001_00125000.json
inflating: pubChem_p_00125001_00150000.json
inflating: pubChem_p_00150001_00175000.json
inflating: pubChem_p_00175001_00200000.json
inflating: pubChem_p_00200001_00225000.json
inflating: pubChem_p_00225001_00250000.json

각 json file들의 크기와 row 수는 아래와 같습니다.

[u0017649@sys-96775 files]$ ls -l *.json
-rw-rw-r--. 1 u0017649 u0017649 111430473 Aug 14 2017 pubChem_p_00000001_00025000.json
-rw-rw-r--. 1 u0017649 u0017649 115752953 Aug 14 2017 pubChem_p_00025001_00050000.json
-rw-rw-r--. 1 u0017649 u0017649 119400902 Aug 14 2017 pubChem_p_00050001_00075000.json
-rw-rw-r--. 1 u0017649 u0017649 116769374 Aug 14 2017 pubChem_p_00075001_00100000.json
-rw-rw-r--. 1 u0017649 u0017649 116383795 Aug 14 2017 pubChem_p_00100001_00125000.json
-rw-rw-r--. 1 u0017649 u0017649 122537630 Aug 14 2017 pubChem_p_00125001_00150000.json
-rw-rw-r--. 1 u0017649 u0017649 96286512 Aug 14 2017 pubChem_p_00150001_00175000.json
-rw-rw-r--. 1 u0017649 u0017649 126743707 Aug 14 2017 pubChem_p_00175001_00200000.json
-rw-rw-r--. 1 u0017649 u0017649 129597062 Aug 14 2017 pubChem_p_00200001_00225000.json
-rw-rw-r--. 1 u0017649 u0017649 147174708 Aug 14 2017 pubChem_p_00225001_00250000.json

[u0017649@sys-96775 files]$ wc -l *.json
4924343 pubChem_p_00000001_00025000.json
5096119 pubChem_p_00025001_00050000.json
5258731 pubChem_p_00050001_00075000.json
5159241 pubChem_p_00075001_00100000.json
5120705 pubChem_p_00100001_00125000.json
5401395 pubChem_p_00125001_00150000.json
4234059 pubChem_p_00150001_00175000.json
5570529 pubChem_p_00175001_00200000.json
5698247 pubChem_p_00200001_00225000.json
6485207 pubChem_p_00225001_00250000.json
52948576 total

각각의 json 파일에는 En(에너지 값), atoms(원자 종류와 구조), id(일련번호), shapeM(shape multipole)의 4개 값이 들어있습니다.

{
'En': 37.801,
'atoms': [
{'type': 'O', 'xyz': [0.3387, 0.9262, 0.46]},
{'type': 'O', 'xyz': [3.4786, -1.7069, -0.3119]},
{'type': 'O', 'xyz': [1.8428, -1.4073, 1.2523]},
{'type': 'O', 'xyz': [0.4166, 2.5213, -1.2091]},
{'type': 'N', 'xyz': [-2.2359, -0.7251, 0.027]},
{'type': 'C', 'xyz': [-0.7783, -1.1579, 0.0914]},
{'type': 'C', 'xyz': [0.1368, -0.0961, -0.5161]},
...
{'type': 'H', 'xyz': [1.5832, 2.901, 1.6404]}
],
'id': 1,
'shapeM': [259.66, 4.28, 3.04, 1.21, 1.75, 2.55, 0.16, -3.13, -0.22, -2.18, -0.56, 0.21, 0.17, 0.09]
}

이 json 파일 하나하나에는 다음과 같이 18205개의 atoms 항목이 들어있습니다.

[u0017649@sys-96775 files]$ grep atoms pubChem_p_00000001_00025000.json | wc -l
18205

json 파일 하나의 row 수가 4924343이니까 4924343/18205 = 270 즉, 하나의 atoms 항목 안에 270줄의 data가 들어있는 셈이지요.

[u0017649@sys-96775 files]$ grep type pubChem_p_00000001_00025000.json | wc -l
565479

또 json 파일 하나에는 type이라는 단어가 565479 줄이 나옵니다.

[u0017649@sys-96775 files]$ echo "565479/18205" | bc -l
31.06174127986816808569

즉 하나의 atoms 당 평균 31개의 type가 존재한다는 것인데, 그나마 모든 atoms 항목에 균등한 개수의 type이 들어있는 것도 아니라는 뜻입니다. 하긴 각 분자마다 들어있는 원자의 종류와 개수가 각기 다를 수 밖에 없지요.

특히 atoms와 shapeM라는 항목들은 그 하나하나가 비정형 array로 되어 있습니다. 즉 분자마다 들어있는 원자 개수도 다르고 그에 따라 shape multipole 값도 다릅니다. 이런 비정형 array로 되어 있는 string 값을 분석하여 일정한 pattern을 모델링한다는 것은 매우 어려운 일이 될 것입니다.

이를 수치적으로 분석하려고 해도, atoms 항목 내의 저 많은 값들을 어떻게 분리하고 어떤 이름의 column으로 재정비해야 할지 골치가 아플 수 밖에 없습니다. 원래 그런 고민스러운 작업을 feature engineering이라고 하지요. 이 feature engineering을 어떻게 하느냐에 따라 machine learning으로 만들어낸 model의 성능과 accuracy가 크게 좌우됩니다. 이런 숙제는 data scientist들에게 돌아가는데, 숙련된 data scientist에게도 이는 크게 골치 아픈 일이며 또 과중한 업무 부담으로 이어지게 됩니다.

하지만 이 모든 것을 간단하게 해결해주는 것이 바로 H2O DriverlessAI입니다 ! H2O DriverlessAI의 가장 큰 혜택 중 하나가 자동화된 feature engineering 아니겠습니까 ?

하지만 전에 H2O DriverlessAI는 comma(,)나 pipe(|) 등으로 분리된 CSV 파일이나 Excel(xls, xlsx) 파일만 다룰 수 있다고 하지 않았던 가요 ? 저런 json 파일도 가공없이 통째로 분석할 수 있나요 ? 불행히도 그렇게는 안됩니다.

하지만 json 파일을 csv 파일로 가공하는 것은 매우 쉽습니다. 저는 개발자가 아니며 python program이라고는 Hello World 조차 제대로 할 줄 모르는 젬병이지만, 구글링해보면 구할 수 있는 아래의 샘플 코드 하나로 금방 이걸 전환할 수 있었습니다.

https://gist.github.com/pbindustries/803464d20f48a0a23d5934e3d11dadd6

위의 github에 올려진 sample을 이용하여 아래와 같이 j2c.py라는 이름의 매우 간단한 python code를 짰습니다. 짠 것이 아니라 그대로 copy & paste 했습니다.

[u0017649@sys-96775 files]$ vi j2c.py

import csv, json, sys
#check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open(fileInput) #open json file
outputFile = open(fileOutput, 'w') #load csv file
data = json.load(inputFile) #load json content
inputFile.close() #close the input file
output = csv.writer(outputFile) #create a csv.write
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values()) #values row

이제 이를 이용하여 json 파일들을 csv 파일 형태로 변환하겠습니다.

[u0017649@sys-96775 files]$ for i in `ls *.json`
> do
> python j2c.py ./$i ./${i}.csv
> done

거의 날로 먹기지요 ? 아래와 같이 순식간에 csv 파일들이 새로 생성되었습니다.

[u0017649@sys-96775 files]$ ls *.csv
pubChem_p_00000001_00025000.json.csv pubChem_p_00125001_00150000.json.csv
pubChem_p_00025001_00050000.json.csv pubChem_p_00150001_00175000.json.csv
pubChem_p_00050001_00075000.json.csv pubChem_p_00175001_00200000.json.csv
pubChem_p_00075001_00100000.json.csv pubChem_p_00200001_00225000.json.csv
pubChem_p_00100001_00125000.json.csv pubChem_p_00225001_00250000.json.csv

csv 파일 속의 Row 수는 header까지 포함하여 18206, 즉 atoms 개수대로 만들어졌습니다.

[u0017649@sys-96775 files]$ wc -l pubChem_p_00000001_00025000.json.csv
18206 pubChem_p_00000001_00025000.json.csv

각 csv의 형태는 아래와 같습니다. En, id, shapeM, atoms의 4개 column으로 되어있는데, shapeM과 atoms는 여전히 무지막지한 형태의 비정형 string으로 되어 있습니다. id column은 분석에는 사실상 무의미한 column이지요. (보시기 편하도록 제가 shapeM에는 빨간색, atoms에는 파란색으로 글자색을 바꿨습니다.)

[u0017649@sys-96775 files]$ head -n 2 pubChem_p_00000001_00025000.json.csv
En,id,shapeM,atoms
37.801,1,"[259.66, 4.28, 3.04, 1.21, 1.75, 2.55, 0.16, -3.13, -0.22, -2.18, -0.56, 0.21, 0.17, 0.09]","[{u'xyz': [0.3387, 0.9262, 0.46], u'type': u'O'}, {u'xyz': [3.4786, -1.7069, -0.3119], u'type': u'O'}, {u'xyz': [1.8428, -1.4073, 1.2523], u'type': u'O'}, {u'xyz': [0.4166, 2.5213, -1.2091], u'type': u'O'}, {u'xyz': [-2.2359, -0.7251, 0.027], u'type': u'N'}, {u'xyz': [-0.7783, -1.1579, 0.0914], u'type': u'C'}, {u'xyz': [0.1368, -0.0961, -0.5161], u'type': u'C'}, {u'xyz': [-3.1119, -1.7972, 0.659], u'type': u'C'}, {u'xyz': [-2.4103, 0.5837, 0.784], u'type': u'C'}, {u'xyz': [-2.6433, -0.5289, -1.426], u'type': u'C'}, {u'xyz': [1.4879, -0.6438, -0.9795], u'type': u'C'}, {u'xyz': [2.3478, -1.3163, 0.1002], u'type': u'C'}, {u'xyz': [0.4627, 2.1935, -0.0312], u'type': u'C'}, {u'xyz': [0.6678, 3.1549, 1.1001], u'type': u'C'}, {u'xyz': [-0.7073, -2.1051, -0.4563], u'type': u'H'}, {u'xyz': [-0.5669, -1.3392, 1.1503], u'type': u'H'}, {u'xyz': [-0.3089, 0.3239, -1.4193], u'type': u'H'}, {u'xyz': [-2.9705, -2.7295, 0.1044], u'type': u'H'}, {u'xyz': [-2.8083, -1.921, 1.7028], u'type': u'H'}, {u'xyz': [-4.1563, -1.4762, 0.6031], u'type': u'H'}, {u'xyz': [-2.0398, 1.417, 0.1863], u'type': u'H'}, {u'xyz': [-3.4837, 0.7378, 0.9384], u'type': u'H'}, {u'xyz': [-1.9129, 0.5071, 1.7551], u'type': u'H'}, {u'xyz': [-2.245, 0.4089, -1.819], u'type': u'H'}, {u'xyz': [-2.3, -1.3879, -2.01], u'type': u'H'}, {u'xyz': [-3.7365, -0.4723, -1.463], u'type': u'H'}, {u'xyz': [1.3299, -1.3744, -1.7823], u'type': u'H'}, {u'xyz': [2.09, 0.1756, -1.3923], u'type': u'H'}, {u'xyz': [-0.1953, 3.128, 1.7699], u'type': u'H'}, {u'xyz': [0.7681, 4.1684, 0.7012], u'type': u'H'}, {u'xyz': [1.5832, 2.901, 1.6404], u'type': u'H'}]"

이 10개의 파일들로부터 마지막 5줄씩을 미리 잘라내어 총 50줄 (column까지 합하면 51줄)의 pubChem_test1.xlsx라는 test용 dataset을 만들어두겠습니다.

그리고나서 5줄씩 줄어든 이 10개의 파일들을 upload 하기 편하도록 하나의 pubChem1.zip 파일로 zip으로 압축하겠습니다.

[u0017649@sys-96775 files]$ zip pubChem1.zip *.csv
adding: pubChem_p_00000001_00025000.json.csv (deflated 78%)
adding: pubChem_p_00025001_00050000.json.csv (deflated 78%)
adding: pubChem_p_00050001_00075000.json.csv (deflated 78%)
adding: pubChem_p_00075001_00100000.json.csv (deflated 78%)
adding: pubChem_p_00100001_00125000.json.csv (deflated 77%)
adding: pubChem_p_00125001_00150000.json.csv (deflated 78%)
adding: pubChem_p_00150001_00175000.json.csv (deflated 77%)
adding: pubChem_p_00175001_00200000.json.csv (deflated 78%)
adding: pubChem_p_00200001_00225000.json.csv (deflated 78%)
adding: pubChem_p_00225001_00250000.json.csv (deflated 78%)

이제 H2O DAI의 web interface에 접속합니다. Dataset 메뉴에서 이 pubChem1.zip을 H2O DAI 서버로 upload하고 'Details' 항목을 보겠습니다.

보시는 바와 같이 En과 id는 각각 real과 integer로 인식되는데, shapeM과 atoms는 무지막지한 길이와 형태의 string으로 인식됩니다.

하지만 그냥 H2O DAI가 알아서 제대로 해줄 것이라고 믿고 그냥 그대로 prediction (training)으로 들어가겠습니다. 우리가 예측하려는 분자의 energy 값인 En을 Target column으로 정하고 Accuracy와 Time, Interpretability는 각각 10, 7, 7 정도로 세팅해서 돌리겠습니다.

참고로 이렇게 10-7-7로 맞출 경우의 algorithm과 iteration 회수, 그리고 model 및 feature 개수 등은 아래와 같이 설정됩니다. 위 사진의 왼쪽 상세 부분인데 글자가 너무 작아 잘 안 보이실 것 같아서 확대해서 캡춰했습니다.

이제 'Launch experiment'를 클릭하여 training을 시작합니다. 제가 가난하여 GPU가 없는 관계로, 이 training은 모두 2-core짜리 POWER8 가상 머신에서 수행했습니다. (SMT8 때문에 H2O는 이를 2 * 8 = 16-core 짜리 장비라고 인식합니다.) 그래서 꽤 오랜 시간이 걸렸습니다.

Training 중간 과정을 보면 중앙 하단에 'Variable Importance'라는 항목이 보입니다. 이는 dataset 내부의 여러 column 중 어느 column이 En 값 예측에 가장 중요한지 중요도 순으로 보여주는 것인데, 이 값들은 training이 진행됨에 따라 변하기도 하고 새로 나타나기도 합니다. 보시면 우리가 우겨넣은 column은 분명히 En, id, shapeM과 atoms 4개 밖에 없었는데, 이 메뉴에 보여지는 column 이름들은 atoms_0, atoms_18 등 새로운 column 이름들이 많이 나온 것을 보실 수 있습니다. 즉, H2O DAI가 내부에서 자동으로 feature engineering을 수행한 것이지요.

이제 experiment, 즉 training이 끝났습니다. Train된 model을 이용하여 prediction을 해보도록 하겠습니다.

'Score on Another Dataset'이라는 항목을 클릭한 뒤, 아까 따로 잘라놓았던 50줄짜리 pubChem_test1.xlsx를 선택합니다. 그러면 이 excel 표의 shapeM과 atoms column을 분석하고 아까 만들어진 model에 대입한 뒤, En 값이 어떨지 예측을 하여 그 결과를 csv 파일로 download 시켜줍니다.

실제 En 값과 H2O DAI가 예측한 En 값을 비교하여 그래프로 만들면 아래와 같습니다. 나름 꽤 그럴싸한 예측을 했다는 것을 보실 수 있습니다.

이렇게 H2O DAI는 data scientist의 업무 부담을 크게 줄여주고 더 빠른 feature engineering과 modeling을 통해 화학, 생명, 제조 등의 연구실에서도 유용하게 사용하실 수 있습니다.

2018년 12월 19일 수요일

H2O Driverless AI를 이용한 Kaggle 도전 : Creditcard fraud detection

이번 posting에서는 H2O Driverless AI의 정확도가 과연 어느 정도인지 좀더 현실적으로 보여주는 테스트를 돌려보겠습니다.

Kaggle이라는 것은 일종의 data science 경쟁 대회라고 보시면 됩니다. 국가기관이나 기업 등에서 자신들이 해석하기 어려운 (주로 비인식처리된) 실제 data를 올려놓고 전세계 scientist들에게 그 분석 방법에 대해 경쟁을 벌이도록 하는 것입니다. 저는 data scientist와는 100만 광년 정도의 거리에 있는 일개 시스템 엔지니어입니다만, H2O Driverless AI의 힘을 빌어 아래의 신용카드 사기 예측 모델 생성에 도전해보겠습니다.

https://www.kaggle.com/mlg-ulb/creditcardfraud/home

여기서 제공되는 data는 https://www.kaggle.com/mlg-ulb/creditcardfraud/downloads/creditcardfraud.zip 에 올려져 있습니다. Facebook이나 Google 등으로 로그인해야 download가 가능합니다.

압축을 풀면 다음과 같이 28만이 넘는 row를 가진 csv 파일을 얻을 수 있습니다. 이 중 마지막 3천 row를 뚝 잘라 creditcard_test1.csv로 만들고, 나머지 것을 creditcard_train.csv으로 쓰겠습니다.

[bsyu@redhat74 data]$ wc -l creditcard.csv
284808 creditcard.csv

[bsyu@redhat74 data]$ ls -l creditcard*.csv
-rw-rw-r-- 1 bsyu bsyu 150828752 Mar 23 2018 creditcard.csv

[bsyu@redhat74 data]$ head -n 1 creditcard.csv > creditcard_test.csv

[bsyu@redhat74 data]$ tail -n 3000 creditcard.csv >> creditcard_test1.csv

[bsyu@redhat74 data]$ head -n 283808 creditcard.csv > creditcard_train.csv

각 row는 아래와 같이 비식별화처리를 거친 모종의 값으로 되어 있습니다. 여기서 비식별화처리가 되지 않은 column은 맨마지막 2개, 즉 Amount와 Class입니다. Amount는 카드 사용금액이고, Class가 0이면 정상거래, 1이면 사기거래입니다.

[bsyu@redhat74 data]$ head -n 2 creditcard_train.csv
"Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
0,-1.3598071336738,-0.0727811733098497,2.53634673796914,1.37815522427443,-0.338320769942518,0.462387777762292,0.239598554061257,0.0986979012610507,0.363786969611213,0.0907941719789316,-0.551599533260813,-0.617800855762348,-0.991389847235408,-0.311169353699879,1.46817697209427,-0.470400525259478,0.207971241929242,0.0257905801985591,0.403992960255733,0.251412098239705,-0.018306777944153,0.277837575558899,-0.110473910188767,0.0669280749146731,0.128539358273528,-0.189114843888824,0.133558376740387,-0.0210530534538215,149.62,"0"

이 dataset은 Class 측면에서 보면 굉장히 빈도가 낮은 것입니다. 즉, 전체 28만건이 넘는 거래들 중, 사기거래, 즉 Class가 1인 거래는 고작 492건에 불과합니다. Percentage로 따지면 고작 0.173%에 불과합니다. 보통 machine learning에서 만든 모델의 정확도가 90%를 넘어가면 꽤 성공적이라고들 하는데, 이런 경우에는 어떤 거래를 판정하라는 요청을 받았을 때 그냥 무조건 정상거래라고 판정하면 99.8% 이상의 정확도를 나타냈다고(?) 주장할 수도 있습니다. 과연 이런 극악한 조건에서 H2O Driverless AI는 제대로 된 모델을 만들 수 있을까요 ?

[bsyu@redhat74 data]$ cut -d"," -f31 creditcard.csv | grep 1 | wc -l
492

우리의 목표는 이 data를 이용해서 어떤 거래가 사기인지 정상인지 판별하는 model을 만드는 것입니다. 그 성공 여부 판별을 위해 잘라낸 test dataset 3천 row 중에 실제 사기 건수는 얼마나 될까요 ? 고작 4건입니다. 588번째, 871번째, 874번째, 그리고 921번째 row입니다.

[bsyu@redhat74 data]$ cut -d"," -f31 creditcard_test1.csv | grep 1
"1"
"1"
"1"
"1"

[bsyu@redhat74 data]$ cut -d"," -f31 creditcard_test1.csv | head -n 589 | tail -n 5
"0"
"0"
"0"
"1"
"0"

[bsyu@redhat74 data]$ cut -d"," -f31 creditcard_test1.csv | head -n 875 | tail -n 7
"0"
"0"
"1"
"0"
"0"
"1"
"0"

[bsyu@redhat74 data]$ cut -d"," -f31 creditcard_test1.csv | head -n 923 | tail -n 5
"0"
"0"
"1"
"0"
"0"

다음과 같이 H2O Driverless AI에 creditcard_train.csv를 add 하고 target column은 'Class'로 지정한 뒤, control knob은 Accuracy 10, Time 8, Interpretability 2로 맞추고 training (H2O 용어로는 experiment)를 시작했습니다. Accuracy 항목에서 Regression으로 수행하고 있기 때문에, 결과 예측값은 0이나 1의 숫자가 아닌, 그 중간의 소수로 나올 것입니다. 그 값이 1에 가까울 수록 사기일 확률이 높고, 0에 가까울 수록 정상거래일 것입니다.

참고로 이번에는 GPU를 갖춘 시스템을 구하지 못해서, POWER8 가상 머신에서 CPU core 2개를 이용해서 수행했습니다. SMT=8으로 되어 있기 때문에, OS에게는 이것이 2개의 core가 아닌 16개의 logical CPU로 인식됩니다. 덕분에 CPU 사용률이 매우 높고, 또 시간도 엄청나게 오래 걸리는 것을 보실 수 있습니다.

최종적으로 만들어진 model은 크기가 무려 7GB나 되고, 만드는데 거의 9시간이 걸렸습니다.

이제 이 model을 열어 test dataset으로 판별을 해보겠습니다. 맨 처음에 training을 시작할 때 RMSE (root mean square error)가 0.0194로 시작했었는데, 최종적으로는 0.0191로 약간 줄어있는 것을 보실 수 있습니다.

Score on Anonther Dataset을 클릭하면 dataset을 선택하게 되어 있는데, 미리 서버에 올려둔 creditcard_test1.csv을 선택하면 약 1분 정도 prediction이 수행된 뒤에 그 결과로 (input인 creditcard_test1.csv가 3천 row이므로) Class column에 대한 3천 row의 예측값을 담은 csv 파일을 download 할 수 있게 해줍니다. 그 결과는 아래와 같습니다.

최소한 정상거래를 사기거래라고 판정한 것은 하나도 없습니다. 대부분의 row는 사기 확률이 0%로 나오고, 실제로는 사기였던 4개 row에 대해서는 위와 같이 나옵니다. 즉 2개는 98%, 99%로서 사기가 확실하고, 나머지 2개는 사기 확률이 36%와 49%로 나오는 것이지요. 이 정도면 상당히 높은 수준의 결과라고 판단됩니다. 정말 멋지지 않습니까 ?

---------------------------

추가로, training dataset을 좀더 작게, 대신 test dataset을 좀더 크게 하여 model을 생성하고 테스트해보겠습니다. 아래와 같이, 전체 28만4천건 중에 26만건을 training dataset으로, 그리고 나머지 2만4천건을 test dataset으로 잘라냈습니다.

[bsyu@redhat74 data]$ wc -l credi*
284808 creditcard.csv

[bsyu@redhat74 data]$ head -n 260001 creditcard.csv > credit_train.csv

[bsyu@redhat74 data]$ head -n 1 creditcard.csv > credit_test.csv

[bsyu@redhat74 data]$ tail -n 24807 creditcard.csv >> credit_test.csv

[bsyu@redhat74 data]$ wc -l creditcard.csv credit_train.csv credit_test.csv
284808 creditcard.csv
260001 credit_train.csv
24808 credit_test.csv

이것을 다음과 같이 Accuracy 10, Time 7, Interpretability 2로 맞추고 training 했습니다.

그 결과로 (역시 GPU 없이 POWER8 CPU core 2개만 이용해서) training 하는데 총 13시간 정도가 걸렸습니다.

이 자동생성된 model을 이용하여 위에서 준비한 2만4천건의 test dataset인 credit_test.csv에 대한 prediction을 수행했고, 그 결과로 gerewika_preds_799411be.csv 파일을 얻었습니다.

[bsyu@redhat74 data]$ head /tmp/gerewika_preds_799411be.csv
Class
4.00790426897506e-5
2.7836374905953808e-5
4.3067764191826185e-5
6.776587835599978e-5
0.0007914757505702477
0.00026333562696054583
0.0003490925939070681
0.00012611508177717525
3.6875439932445686e-5

위에서 준비했던 credit_test.csv 속에는 과연 사기 건수가 몇건이었을까요 ? 그 column 중 31번째 column인 Class의 값이 1인 것이 몇건인지를 세어보겠습니다.

[bsyu@redhat74 data]$ cut -f31 -d',' ./credit_test.csv | grep 1 | wc -l

위와 같이 총 21건입니다.

과연 우리가 얻은 gerewika_preds_799411be.csv 파일 속에서 사기일 가능성이 높은 칼럼은 몇 개일까요 ? 그것을 세기 위해 다음과 같은 script를 만들었습니다.

[bsyu@redhat74 data]$ cat count.sh
if [[ $# -ne 2 ]]
then
echo "Usage ./count.sh digit filename"
fi
j=1
for i in `cat $2 `
do
m=$(printf "%f" "$i")
if (( $(echo "$m >= $1" | bc -l) ))
then
echo "Row_num is $j and the vlaue is $m"
fi
(( j=j+1 ))
done

여기서 몇 %일 때 이를 사기로 볼 것인지는 여러분이 직접 정하셔야 합니다. 일단 70%, 즉 0.70 이상이면 사기로 간주하는 것으로 해서 세어보겠습니다.

[bsyu@redhat74 data]$ ./count.sh 0.7 /tmp/gerewika_preds_799411be.csv

./count.sh: line 9: printf: Class: invalid number
Row_num is 1058 and the vlaue is 0.740187
Row_num is 1475 and the vlaue is 0.897596
Row_num is 1927 and the vlaue is 0.975030
Row_num is 2562 and the vlaue is 0.996552
Row_num is 2828 and the vlaue is 0.968835
Row_num is 3276 and the vlaue is 0.979910
Row_num is 3326 and the vlaue is 0.939915
Row_num is 3879 and the vlaue is 0.896258
Row_num is 16866 and the vlaue is 0.995111
Row_num is 19865 and the vlaue is 0.958619
Row_num is 20145 and the vlaue is 0.905748
Row_num is 20151 and the vlaue is 0.911095
Row_num is 21146 and the vlaue is 0.964987

70%를 기준으로 하니 13건이 사기로 판명되었습니다. 실제값인 21건을 정확하게 맞추지는 못했습니다. Training dataset이 28만 row에서 23만 row로 줄어드니 확실히 정확도가 떨어지는 것을 보실 수 있습니다. 그런 상황에서도 적어도 정상 transaction을 사기 transaction으로 평가하는 일은 없군요. 참고로 test dataset 속의 실제 사기 transaction의 row #는 다음과 같습니다.

Row_num is 1058 and the vlaue is 1.000000
Row_num is 1475 and the vlaue is 1.000000
Row_num is 1927 and the vlaue is 1.000000
Row_num is 2562 and the vlaue is 1.000000
Row_num is 2828 and the vlaue is 1.000000
Row_num is 3082 and the vlaue is 1.000000
Row_num is 3276 and the vlaue is 1.000000
Row_num is 3326 and the vlaue is 1.000000
Row_num is 3879 and the vlaue is 1.000000
Row_num is 8377 and the vlaue is 1.000000
Row_num is 12523 and the vlaue is 1.000000
Row_num is 14384 and the vlaue is 1.000000
Row_num is 14477 and the vlaue is 1.000000
Row_num is 15994 and the vlaue is 1.000000
Row_num is 16073 and the vlaue is 1.000000
Row_num is 16866 and the vlaue is 1.000000
Row_num is 19865 and the vlaue is 1.000000
Row_num is 20145 and the vlaue is 1.000000
Row_num is 20151 and the vlaue is 1.000000
Row_num is 21146 and the vlaue is 1.000000
Row_num is 21676 and the vlaue is 1.000000

-----------------------------------------------

이하는 상기 model을 training한 뒤 얻은 MLI (machine learning interpretation) 항목의 결과입니다.

가장 알아보기 쉬운 KLIME의 설명을 보면 다음과 같습니다. 아래 그림에도 있습니다만, Class 항목이 1 증가할 때 V11은 0.0058 증가, V4는 0.0036 증가... 등의 상관관계를 Driverless AI가 분석해냈습니다.

Top Positive Global Attributions
V11 increase of 0.0058
V4 increase of 0.0036
V2 increase of 0.0024

------------------------------------

추가 테스트를 해봤습니다. 다음과 같이 약 250000번째 줄부터 4000 줄을 빼내어 credit_test.csv라는 test용 dataset을 만들고, 그 4000 줄을 뺀 나머지 부분으로 credit_train.csv이라는 training용 dataset을 만들었습니다.

ibmtest@digits:~/files/csv$ head -n 250001 creditcard.csv > credit_train.csv
ibmtest@digits:~/files/csv$ tail -n 30808 creditcard.csv >> credit_train.csv
ibmtest@digits:~/files/csv$ head -n 1 creditcard.csv > credit_test.csv
ibmtest@digits:~/files/csv$ head -n 254809 creditcard.csv | tail -n 4000 >> credit_test.csv
ibmtest@digits:~/files/csv$ wc -l credit_train.csv credit_test.csv
280809 credit_train.csv
4001 credit_test.csv
284810 total

Training용 dataset에는 class가 1, 즉 사기 transaction이 총 484개 들어있고, test용 dataset에는 9개 들어있습니다.

ibmtest@digits:~/files/csv$ cut -f31 -d',' ./credit_train.csv | grep 1 | wc -l
484

ibmtest@digits:~/files/csv$ cut -f31 -d',' ./credit_test.csv | grep 1 | wc -l
9

이제 test용 dataset 중 몇번째 row가 사기 transaction인지 row #를 찾아보겠습니다.

ibmtest@digits:~/files/csv$ j=1

ibmtest@digits:~/files/csv$ for i in `cut -f31 -d',' ./credit_test.csv`
> do
> if [ "$i" == "\"1\"" ]
> then
> echo "Row_num is " $j
> fi
> ((j=j+1))
> done

Row_num is 671
Row_num is 1060
Row_num is 1075
Row_num is 1085
Row_num is 1098
Row_num is 1318
Row_num is 1968
Row_num is 3538
Row_num is 3589

이제 H2O DAI로 training을 한 뒤, 이 test dataset을 넣어서 예측을 해보았습니다. 그 결과는 다음과 같습니다.

30% (0.3)을 기준으로 하면 아래와 같이 9건 중 6건을 찾아냅니다.

ibmtest@digits:~/files/csv$ ./count.sh 0.3 miwatuvi_preds_c5262e68.csv
./count.sh: line 8: printf: Class: invalid number
Row_num is 671 and the vlaue is 1.000000
Row_num is 1060 and the vlaue is 0.950008
Row_num is 1098 and the vlaue is 0.904203
Row_num is 1318 and the vlaue is 1.000000
Row_num is 1968 and the vlaue is 1.000000
Row_num is 3538 and the vlaue is 0.303216

20% (0.2)을 기준으로 하면 아래와 같이 9건 중 7건을 찾아냅니다.

ibmtest@digits:~/files/csv$ ./count.sh 0.2 miwatuvi_preds_c5262e68.csv
./count.sh: line 8: printf: Class: invalid number
Row_num is 671 and the vlaue is 1.000000
Row_num is 1060 and the vlaue is 0.950008
Row_num is 1098 and the vlaue is 0.904203
Row_num is 1318 and the vlaue is 1.000000
Row_num is 1968 and the vlaue is 1.000000
Row_num is 3538 and the vlaue is 0.303216
Row_num is 3589 and the vlaue is 0.296002

2018년 10월 11일 목요일

Python을 위한 local private repository 만들기 (bandersnatch + nginx)

Python은 그 특성상 필요에 따라 자주 새로운 package를 설치해서 써야 하는데, 이를 위해서는 해당 서버가 443번 포트를 통해 인터넷에 연결되어 있는 환경이어야 합니다. 그러나 실제로는 대부분의 기업용 data center들에서는 인터넷과의 연결이 완전히 단절되어 있으므로 python을 쓰기에 매우 불편합니다. 이는 x86이든 ppc64le이든 arm이든 모든 CPU 아키텍처에서 공통적으로 골치아파하는 문제이며, 국내 대기업들은 물론 해외 유명 인터넷 업체에서도 다 공통적으로 겪는 문제입니다. 가령 Pillow라는 package를 import해서 쓰려면 먼저 그 package를 설치해야 하는데, pip 명령으로 설치하면 아래처럼 그때그때 인터넷에서 source를 download 받아서 즉석에서 build해서 설치합니다.

[bsyu@p57a22 ~]$ pip install Pillow
Collecting Pillow
Downloading https://files.pythonhosted.org/packages/1b/e1/1118d60e9946e4e77872b69c58bc2f28448ec02c99a2ce456cd1a272c5fd/Pillow-5.3.0.tar.gz (15.6MB)
100% |████████████████████████████████| 15.6MB 11.6MB/s
Building wheels for collected packages: Pillow
Running setup.py bdist_wheel for Pillow ... done
Stored in directory: /home/bsyu/.cache/pip/wheels/df/81/28/47e761b5e307472ba7c2c5ced6e52037bbefe33c9c4b2a627e
Successfully built Pillow
Installing collected packages: Pillow
Successfully installed Pillow-5.3.0

이런 문제를 해결하는 왕도는 따로 없으나, 가장 근본적인 해결책은 외부망과 분리된 data center 내에 python을 위한 private local repository를 구성하는 것입니다. 다만 여기에는 2가지 문제가 있습니다.

1) Public python repository는 pypi.org (실제로는 files.pythonhosted.org) 인데, 전체 repository 크기가 800GB가 넘고, 날마다 계속 커지고 있습니다.

2) 기존 python package들의 새버전과, 또 아예 새로 만들어지는 python package들이 날마다 쏟아져 나오므로 주기적으로 자주 update를 해줘야 합니다.

보통 1번 문제는 (돈 있는 기업들 입장에서는) 큰 문제가 아닙니다. Local repository server에 넉넉한 크기의 disk를 쓰면 되기 때문입니다. 문제는 2번 문제인데, 이것도 뾰족한 방법은 없고 외부에서 (1달에 1번 정도) 주기적으로 pypi.org의 내용을 backup 받아서 USB 외장 disk에 담아 data center로 들여온 뒤 local repository server에 부어주는 수 밖에 없습니다.

이런 정책만 정해지면 local repository server를 구성하는 것은 그다지 어렵지 않습니다. 다음과 같이 bandersnatch라는 package를 이용하면 쉽습니다.

아래의 모든 테스트는 ppc64le 아키텍처인 IBM POWER8 processor와 Redhat 7.5를 이용해서 수행되었습니다. 혹자는 x86_64를 위한 python repository는 어디 있는지 알겠는데 ppc64le를 위한 것은 어디 있냐고 여쭈시는 분도 있습니다만, x86이나 ppc64le나 aarch64나 모두 같은 repository를 사용합니다.

먼저, 다음과 같이 Anaconda3가 설치된 환경에서 pip로 bandersnatch를 설치합니다.

[bsyu@p57a22 ~]$ which pip
~/anaconda3/bin/pip

[bsyu@p57a22 ~]$ pip install -r https://bitbucket.org/pypa/bandersnatch/raw/stable/requirements.txt
...
Successfully installed apipkg-1.4 bandersnatch-2.0.0 coverage-4.3.1 execnet-1.4.1 mock-2.0.0 packaging-16.8 pbr-1.10.0 pep8-1.7.0 py-1.4.32 pyflakes-1.3.0 pyparsing-2.1.10 pytest-3.0.5 pytest-cache-1.0 pytest-catchlog-1.2.2 pytest-codecheckers-0.2 pytest-cov-2.4.0 pytest-timeout-1.2.0 python-dateutil-2.6.0 requests-2.12.4 setuptools-33.1.1 six-1.10.0 xmlrpc2-0.3.1

[bsyu@p57a22 ~]$ which bandersnatch
~/anaconda3/bin/bandersnatch

그리고 'bandersnatch mirror' 명령을 수행합니다. 그러면 기본 /etc/bandersnatch.conf을 만들어줍니다. 이 때문에 이 명령은 일단 sudo 권한이 필요합니다.

[bsyu@p57a22 ~]$ sudo /home/bsyu/anaconda3/bin/bandersnatch mirror
2018-10-10 03:31:32,069 WARNING: Config file '/etc/bandersnatch.conf' missing, creating default config.
2018-10-10 03:31:32,069 WARNING: Please review the config file, then run 'bandersnatch' again.

이 /etc/bandersnatch.conf를 필요에 따라 수정합니다. 여기서는 어느 directory에 python repository를 내려받을 것인지만 수정했습니다.

[bsyu@p57a22 ~]$ sudo vi /etc/bandersnatch.conf
...
; directory = /srv/pypi
directory = /home/bsyu/files/pypi
...

다시 'bandersnatch mirror' 명령을 수행하면 정규 public repository (files.pythonhosted.org)를 통째로 download 받습니다. 800GB가 넘는다는 점에 유의하시기 바랍니다. 저도 끝까지 download 받아본 적은 없고, disk가 부족하여 도중에 끊어야 했습니다.

[bsyu@p57a22 ~]$ bandersnatch mirror
2018-10-10 03:33:35,064 INFO: bandersnatch/2.0.0 (cpython 3.7.0-final0, Linux ppc64le)
2018-10-10 03:33:35,064 INFO: Setting up mirror directory: /home/bsyu/files/pypi/
2018-10-10 03:33:35,064 INFO: Setting up mirror directory: /home/bsyu/files/pypi/web/simple
2018-10-10 03:33:35,064 INFO: Setting up mirror directory: /home/bsyu/files/pypi/web/packages
2018-10-10 03:33:35,064 INFO: Setting up mirror directory: /home/bsyu/files/pypi/web/local-stats/days
2018-10-10 03:33:35,064 INFO: Generation file missing. Reinitialising status files.
2018-10-10 03:33:35,065 INFO: Status file missing. Starting over.
2018-10-10 03:33:35,065 INFO: Syncing with https://pypi.python.org.
2018-10-10 03:33:35,065 INFO: Current mirror serial: 0
2018-10-10 03:33:35,065 INFO: Syncing all packages.
2018-10-10 03:33:37,934 INFO: Trying to reach serial: 4358961
2018-10-10 03:33:37,934 INFO: 154638 packages to sync.
...
2018-10-10 03:53:50,360 INFO: Downloading: https://files.pythonhosted.org/packages/8a/5c/625ac1a93da3a672f52d947023770331b958a1100cd9889b727cde5f7ba5/CommonMark-0.7.5-py2.py3-none-any.whl
2018-10-10 03:53:50,360 DEBUG: Getting https://files.pythonhosted.org/packages/8a/5c/625ac1a93da3a672f52d947023770331b958a1100cd9889b727cde5f7ba5/CommonMark-0.7.5-py2.py3-none-any.whl (serial None)
2018-10-10 03:53:50,363 INFO: Syncing package: CompCamps-Cash-Api (serial 4042164)
2018-10-10 03:53:50,363 DEBUG: Getting /pypi/CompCamps-Cash-Api/json (serial 4042164)
2018-10-10 03:53:50,370 INFO: Downloading: https://files.pythonhosted.org/packages/77/16/44a297228a439484d049cdad818c7f6691c162b4cd741c619caeb208bb1e/CommonMark-0.7.5.tar.gz
2018-10-10 03:53:50,371 DEBUG: Getting https://files.pythonhosted.org/packages/77/16/44a297228a439484d049cdad818c7f6691c162b4cd741c619caeb208bb1e/CommonMark-0.7.5.tar.gz (serial None)
...

이제 해당 directory에 가보면 다음과 같이 web/packages 밑에 실제 python package 파일들이 download 되어진 것을 보실 수 있습니다. web/simple은 그 파일들에 대한 index directory입니다.

[bsyu@p57a22 ~]$ cd files/pypi

[bsyu@p57a22 pypi]$ ls
generation todo web

[bsyu@p57a22 pypi]$ cd web

[bsyu@p57a22 web]$ ls
local-stats packages simple

[bsyu@p57a22 web]$ cd packages/

[bsyu@p57a22 packages]$ ls
00 0b 16 21 2c 37 42 4e 59 64 6f 7a 85 90 9b a6 b1 bc c7 d2 dd e8 f3 fe
01 0c 17 22 2d 38 43 4f 5a 65 70 7b 86 91 9c a7 b2 bd c8 d3 de e9 f4 ff
02 0d 18 23 2e 39 44 50 5b 66 71 7c 87 92 9d a8 b3 be c9 d4 df ea f5
03 0e 19 24 2f 3a 45 51 5c 67 72 7d 88 93 9e a9 b4 bf ca d5 e0 eb f6
04 0f 1a 25 30 3b 46 52 5d 68 73 7e 89 94 9f aa b5 c0 cb d6 e1 ec f7
05 10 1b 26 31 3c 47 53 5e 69 74 7f 8a 95 a0 ab b6 c1 cc d7 e2 ed f8
06 11 1c 27 32 3d 49 54 5f 6a 75 80 8b 96 a1 ac b7 c2 cd d8 e3 ee f9
07 12 1d 28 33 3e 4a 55 60 6b 76 81 8c 97 a2 ad b8 c3 ce d9 e4 ef fa
08 13 1e 29 34 3f 4b 56 61 6c 77 82 8d 98 a3 ae b9 c4 cf da e5 f0 fb
09 14 1f 2a 35 40 4c 57 62 6d 78 83 8e 99 a4 af ba c5 d0 db e6 f1 fc
0a 15 20 2b 36 41 4d 58 63 6e 79 84 8f 9a a5 b0 bb c6 d1 dc e7 f2 fd

이렇게 download된 generation todo web의 3개 directory를 포함하는 python repository directory 전체 (여기서는 /home/bsyu/files/pypi)를 USB 외장 disk 등에 복사하여 local repository로 사용할 서버에 옮깁니다. 여기서는 편의상 그냥 download 받은 서버에서 직접 local repository 서버를 구성하겠습니다.

https을 통해서 다른 서버들에게 python repository 서비스를 하기 위해서는 당연히 web 서버를 구성해야 합니다. 여기서는 간단하게 nginx를 사용하겠습니다. 먼저 nginx를 YUM 명령으로 설치한 뒤, /etc/nginx/nginx.conf를 수정합니다. 단, 여기서 http(80번 포트)가 아닌 https(443번 포트)를 구성해야 합니다. 그리고 root를 복사한 directory 중 'web' directory로 정해야 하고, SSL certificate과 그 key도 등록해야 합니다. server_name은 FQDN(Fully Qualified Domain Name)으로 합니다. 여기서는 IP address로 했습니다.

[bsyu@p57a22 ~]$ sudo yum install nginx

[bsyu@p57a22 ~]$ sudo vi /etc/nginx/nginx.conf
...
server {
listen 443 ssl http2 default_server;
listen [::]:443 ssl http2 default_server;
# root /usr/share/nginx/html;
root /home/bsyu/files/pypi/web;
autoindex on;
charset utf-8;

# server_name _;
server_name 129.40.xx.xx;
ssl_certificate "/etc/pki/nginx/private/domain.crt";
ssl_certificate_key "/etc/pki/nginx/private/domain.key";
ssl_session_cache shared:SSL:1m;
ssl_session_timeout 10m;
...

이제 SSL 구성을 위해 crt와 key를 생성합니다. 여기서는 1년간 유효한 Self-Signed Certificate (x509)를 만들었습니다.

[bsyu@p57a22 ~]$ cd /etc/pki/nginx/private

[root@p57a22 private]# openssl req -newkey rsa:2048 -nodes -keyout domain.key -x509 -days 365 -out domain.crt
Generating a 2048 bit RSA private key
.......................................................................................+++++
..............................+++++
writing new private key to 'domain.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:.
State or Province Name (full name) [Some-State]:.
Locality Name (eg, city) []:.
Organization Name (eg, company) [Internet Widgits Pty Ltd]:.
Organizational Unit Name (eg, section) []:.
Common Name (e.g. server FQDN or YOUR name) []:129.40.xx.xx
Email Address []:.

[root@p57a22 private]# ls -ltr
total 8
-rw-r--r-- 1 root root 1704 Oct 11 00:13 domain.key
-rw-r--r-- 1 root root 1107 Oct 11 00:13 domain.crt

생성된 certificate과 key의 검사는 아래와 같이 할 수 있습니다.

[root@p57a22 private]# openssl x509 -text -noout -in domain.crt
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
a7:03:d9:7b:33:2d:53:7c
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=129.40.xx.xx
Validity
Not Before: Oct 11 04:13:00 2018 GMT
Not After : Oct 11 04:13:00 2019 GMT
Subject: CN=129.40.116.82
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:ca:73:3c:7f:e6:29:1e:5e:7b:ff:b5:98:30:ce:
fb:48:0e:bc:96:fd:5b:7f:1e:23:e5:62:8f:74:8e:
...

[root@p57a22 private]# openssl rsa -check -in domain.key
RSA key ok
writing RSA key
-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAynM8f+YpHl57/7WYMM77SA68lv1bfx4j5WKPdI6ftW2lRdrw
fUikVx0C+2ni3QB6y/xuT8yT0eiCPT5Ak4yGpDSsfcfzDOFgVSB02irWmX/KUNIS
/zS+E7SkAfariUEFa8iRjt2kmDpi65YGKH9NY7p136NcZOSZQx2wsAU0UM5Pjtci
....

이제 생성된 certificate과 key를 합해 PEM을 만듭니다.

[root@p57a22 private]# cat domain.crt domain.key >> domain.pem

이제 nginx를 start 합니다.

[root@p57a22 private]# systemctl status nginx

그리고 pip에서 이 129.40.xx.xx를 trusted-host로 사용하도록 ~/.pip/pip.conf를 생성하여 다음과 같은 내용을 넣어줍니다.

[bsyu@p57a22 ~]$ vi .pip/pip.conf
[global]
trusted-host = 129.40.xx.xx
index = https://129.40.xx.xx:443/packages
index-url = https://129.40.xx.xx:443/simple
cert = /etc/pki/nginx/private/domain.pem

이제 CommonMark라는 package를 설치해봅니다. 기존처럼 files.pythonhosted.org가 아니라 129.40.xx.xx에서 package source를 가져오는 것을 보실 수 있습니다.

[bsyu@p57a22 ~]$ pip install CommonMark
Looking in indexes: https://129.40.xx.xx:443/simple
Collecting CommonMark
Downloading https://129.40.xx.xx:443/packages/ab/ca/439c88039583a29564a0043186875258e9a4f041fb5c422cd387b8e10175/commonmark-0.8.1-py2.py3-none-any.whl (47kB)
100% |████████████████████████████████| 51kB 37.8MB/s
Requirement already satisfied: future in ./anaconda3/lib/python3.7/site-packages (from CommonMark) (0.16.0)
Installing collected packages: CommonMark
Successfully installed CommonMark-0.8.1

2017년 12월 18일 월요일

ppc64le에서 사용가능한 open source anti-virus SW : CLAMAV

Minsky의 아키텍처인 ppc64le(IBM POWER8)에서도 사용 가능한 anti-virus SW가 있습니다. CLAM Anti-Virus (clamav)입니다.

CLAMAV는 open source 기반의 anti-virus SW로서, 다음이 홈페이지로 되어 있고, source를 download 받을 수도 있습니다.

http://www.clamav.net/

ppc64le에서 빌드하는 방법도 매우 간단하여, 그냥 ./configure && make && sudo make install 만 해주시면 됩니다.

그러나 deep learning에서 주로 사용하는 Ubuntu에는 아예 OS의 표준 apt repository에 포함되어 있어 손쉽게 설치 및 사용이 가능합니다.

설치는 다음과 같이 apt-get install 명령으로 하시면 됩니다.

u0017649@sys-89983:~$ sudo apt-get install clamav clamav-daemon clamav-freshclam clamav-base libclamav-dev clamav-testfiles

clamav-daemon은 다음과 같이 start 하시면 됩니다.

u0017649@sys-89983:~$ sudo systemctl start clamav-daemon.service

u0017649@sys-89983:~$ sudo systemctl status clamav-daemon.service
● clamav-daemon.service - Clam AntiVirus userspace daemon
Loaded: loaded (/lib/systemd/system/clamav-daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2017-12-17 21:07:57 EST; 4s ago
Docs: man:clamd(8)
man:clamd.conf(5)
http://www.clamav.net/lang/en/doc/
Main PID: 9462 (clamd)
Tasks: 1
Memory: 234.0M
CPU: 4.121s
CGroup: /system.slice/clamav-daemon.service
└─9462 /usr/sbin/clamd --foreground=true

/home/u0017649/hpcc-1.5.0 라는 directory 내용을 scan하여, 혹시 virus에 감염된 파일이 있을 경우 경고(bell)를 울려주는 명령은 다음과 같이 하시면 됩니다.

u0017649@sys-89983:~$ clamscan -r --bell -i /home/u0017649/hpcc-1.5.0

----------- SCAN SUMMARY -----------
Known viruses: 6366898
Engine version: 0.99.2
Scanned directories: 34
Scanned files: 737
Infected files: 0
Data scanned: 9.64 MB
Data read: 6.11 MB (ratio 1.58:1)
Time: 18.255 sec (0 m 18 s)

만약 virus에 감염된 파일이 있을 경우 자동으로 제거까지 하기를 원한다면 다음과 같이 --remove 옵션을 사용하시면 됩니다. 다만, ppc64le 아키텍처에서 virus 감염 파일을 구하는 것은 정말 어려울 것이므로, 위에 언급된 clamav 홈페이지에서 clamav source code를 download 받아서 그 source를 scan해보겠습니다. 그 속에는 test용으로 들어있는 파일들이 있는 모양이더라구요.

u0017649@sys-89983:~$ tar -zxf clamav-0.99.2.tar.gz

u0017649@sys-89983:~$ clamscan -r --remove /home/u0017649/clamav-0.99.2 > clamscan.out

다음과 같이 3개 파일이 감염되었다고 제거된 것을 보실 수 있습니다.

u0017649@sys-89983:~$ grep -i removed clamscan.out
/home/u0017649/clamav-0.99.2/test/.split/split.clam_IScab_int.exeaa: Removed.
/home/u0017649/clamav-0.99.2/test/.split/split.clam.isoaa: Removed.
/home/u0017649/clamav-0.99.2/test/.split/split.clam_IScab_ext.exeaa: Removed.
/home/u0017649/clamav-0.99.2/test/.split/split.clamjol.isoaa: Removed.

u0017649@sys-89983:~$ tail clamscan.out

----------- SCAN SUMMARY -----------
Known viruses: 6366898
Engine version: 0.99.2
Scanned directories: 227
Scanned files: 3231
Infected files: 4
Data scanned: 93.26 MB
Data read: 50.67 MB (ratio 1.84:1)
Time: 28.488 sec (0 m 28 s)

Anti-virus SW는 virus 목록 등이 계속 업데이트 되는 것이 중요하지요. 그런 일을 해주는 것이 freshclam 입니다. 이건 설치되면 자동으로 수행되는데, 그 log는 다음과 같이 확인하실 수 있습니다.

u0017649@sys-89983:~$ sudo tail -f /var/log/clamav/freshclam.log
Sun Dec 17 20:57:04 2017 -> ClamAV update process started at Sun Dec 17 20:57:04 2017
Sun Dec 17 20:58:02 2017 -> Downloading main.cvd [100%]
Sun Dec 17 20:58:13 2017 -> main.cvd updated (version: 58, sigs: 4566249, f-level: 60, builder: sigmgr)
Sun Dec 17 20:58:35 2017 -> Downloading daily.cvd [100%]
Sun Dec 17 20:58:39 2017 -> daily.cvd updated (version: 24138, sigs: 1806393, f-level: 63, builder: neo)
Sun Dec 17 20:58:40 2017 -> Downloading bytecode.cvd [100%]
Sun Dec 17 20:58:40 2017 -> bytecode.cvd updated (version: 319, sigs: 75, f-level: 63, builder: neo)
Sun Dec 17 20:58:44 2017 -> Database updated (6372717 signatures) from db.local.clamav.net (IP: 157.131.0.17)
Sun Dec 17 20:58:44 2017 -> WARNING: Clamd was NOT notified: Can't connect to clamd through /var/run/clamav/clamd.ctl: No such file or directory
Sun Dec 17 20:58:44 2017 -> --------------------------------------

위의 log를 보면 clamd.ctl 파일이 없어서 clamd에 대한 notification이 제대로 되지 않은 것을 보실 수 있습니다. 저 file은 clamav-daemon을 처음 살릴 때 자동 생성되는데, 제가 위에서 'systemctl start clamav-daemon.service' 명령으로 clamav-daemon을 살리기 전에 freshclam이 구동되는 바람에 벌어진 일 같습니다. 이제 clamav-daemon을 제가 살려 놓았으므로, 다음과 같이 freshclam을 죽였다가 살리면 해결됩니다.

u0017649@sys-89983:~$ ps -ef | grep freshclam
clamav 8894 1 1 20:57 ? 00:00:10 /usr/bin/freshclam -d --foreground=true
u0017649 9473 31958 0 21:08 pts/0 00:00:00 grep --color=auto freshclam

u0017649@sys-89983:~$ sudo kill -9 8894

u0017649@sys-89983:~$ sudo /usr/bin/freshclam -d --foreground=false

위에서는 freshcalm을 background daemon으로 살렸습니다. 다시 log를 보시지요.

u0017649@sys-89983:~$ sudo tail -f /var/log/clamav/freshclam.log
Sun Dec 17 20:58:44 2017 -> Database updated (6372717 signatures) from db.local.clamav.net (IP: 157.131.0.17)
Sun Dec 17 20:58:44 2017 -> WARNING: Clamd was NOT notified: Can't connect to clamd through /var/run/clamav/clamd.ctl: No such file or directory
Sun Dec 17 20:58:44 2017 -> --------------------------------------
Sun Dec 17 21:09:37 2017 -> --------------------------------------
Sun Dec 17 21:09:37 2017 -> freshclam daemon 0.99.2 (OS: linux-gnu, ARCH: ppc, CPU: powerpc64le)
Sun Dec 17 21:09:37 2017 -> ClamAV update process started at Sun Dec 17 21:09:37 2017
Sun Dec 17 21:09:37 2017 -> main.cvd is up to date (version: 58, sigs: 4566249, f-level: 60, builder: sigmgr)
Sun Dec 17 21:09:37 2017 -> daily.cvd is up to date (version: 24138, sigs: 1806393, f-level: 63, builder: neo)
Sun Dec 17 21:09:37 2017 -> bytecode.cvd is up to date (version: 319, sigs: 75, f-level: 63, builder: neo)
Sun Dec 17 21:09:37 2017 -> --------------------------------------

이제 error 없이 잘 update된 것을 보실 수 있습니다.

clamconf 명령은 clamav 관련 각종 config 파일을 점검해주는 명령입니다. 그 output은 아래와 같습니다.

u0017649@sys-89983:~$ clamconf
Checking configuration files in /etc/clamav

Config file: clamd.conf
-----------------------
LogFile = "/var/log/clamav/clamav.log"
StatsHostID = "auto"
StatsEnabled disabled
StatsPEDisabled = "yes"
StatsTimeout = "10"
LogFileUnlock disabled
LogFileMaxSize = "4294967295"
LogTime = "yes"
LogClean disabled
LogSyslog disabled
LogFacility = "LOG_LOCAL6"
LogVerbose disabled
LogRotate = "yes"
ExtendedDetectionInfo = "yes"
PidFile disabled
TemporaryDirectory disabled
DatabaseDirectory = "/var/lib/clamav"
OfficialDatabaseOnly disabled
LocalSocket = "/var/run/clamav/clamd.ctl"
LocalSocketGroup = "clamav"
LocalSocketMode = "666"
FixStaleSocket = "yes"
TCPSocket disabled
TCPAddr disabled
MaxConnectionQueueLength = "15"
StreamMaxLength = "26214400"
StreamMinPort = "1024"
StreamMaxPort = "2048"
MaxThreads = "12"
ReadTimeout = "180"
CommandReadTimeout = "5"
SendBufTimeout = "200"
MaxQueue = "100"
IdleTimeout = "30"
ExcludePath disabled
MaxDirectoryRecursion = "15"
FollowDirectorySymlinks disabled
FollowFileSymlinks disabled
CrossFilesystems = "yes"
SelfCheck = "3600"
DisableCache disabled
VirusEvent disabled
ExitOnOOM disabled
AllowAllMatchScan = "yes"
Foreground disabled
Debug disabled
LeaveTemporaryFiles disabled
User = "clamav"
AllowSupplementaryGroups disabled
Bytecode = "yes"
BytecodeSecurity = "TrustSigned"
BytecodeTimeout = "60000"
BytecodeUnsigned disabled
BytecodeMode = "Auto"
DetectPUA disabled
ExcludePUA disabled
IncludePUA disabled
AlgorithmicDetection = "yes"
ScanPE = "yes"
ScanELF = "yes"
DetectBrokenExecutables disabled
ScanMail = "yes"
ScanPartialMessages disabled
PhishingSignatures = "yes"
PhishingScanURLs = "yes"
PhishingAlwaysBlockCloak disabled
PhishingAlwaysBlockSSLMismatch disabled
PartitionIntersection disabled
HeuristicScanPrecedence disabled
StructuredDataDetection disabled
StructuredMinCreditCardCount = "3"
StructuredMinSSNCount = "3"
StructuredSSNFormatNormal = "yes"
StructuredSSNFormatStripped disabled
ScanHTML = "yes"
ScanOLE2 = "yes"
OLE2BlockMacros disabled
ScanPDF = "yes"
ScanSWF = "yes"
ScanXMLDOCS = "yes"
ScanHWP3 = "yes"
ScanArchive = "yes"
ArchiveBlockEncrypted disabled
ForceToDisk disabled
MaxScanSize = "104857600"
MaxFileSize = "26214400"
MaxRecursion = "16"
MaxFiles = "10000"
MaxEmbeddedPE = "10485760"
MaxHTMLNormalize = "10485760"
MaxHTMLNoTags = "2097152"
MaxScriptNormalize = "5242880"
MaxZipTypeRcg = "1048576"
MaxPartitions = "50"
MaxIconsPE = "100"
MaxRecHWP3 = "16"
PCREMatchLimit = "10000"
PCRERecMatchLimit = "5000"
PCREMaxFileSize = "26214400"
ScanOnAccess disabled
OnAccessMountPath disabled
OnAccessIncludePath disabled
OnAccessExcludePath disabled
OnAccessExcludeUID disabled
OnAccessMaxFileSize = "5242880"
OnAccessDisableDDD disabled
OnAccessPrevention disabled
OnAccessExtraScanning disabled
DevACOnly disabled
DevACDepth disabled
DevPerformance disabled
DevLiblog disabled
DisableCertCheck disabled

Config file: freshclam.conf
---------------------------
StatsHostID disabled
StatsEnabled disabled
StatsTimeout disabled
LogFileMaxSize = "4294967295"
LogTime = "yes"
LogSyslog disabled
LogFacility = "LOG_LOCAL6"
LogVerbose disabled
LogRotate = "yes"
PidFile disabled
DatabaseDirectory = "/var/lib/clamav"
Foreground disabled
Debug disabled
AllowSupplementaryGroups disabled
UpdateLogFile = "/var/log/clamav/freshclam.log"
DatabaseOwner = "clamav"
Checks = "24"
DNSDatabaseInfo = "current.cvd.clamav.net"
DatabaseMirror = "db.local.clamav.net", "database.clamav.net"
PrivateMirror disabled
MaxAttempts = "5"
ScriptedUpdates = "yes"
TestDatabases = "yes"
CompressLocalDatabase disabled
ExtraDatabase disabled
DatabaseCustomURL disabled
HTTPProxyServer disabled
HTTPProxyPort disabled
HTTPProxyUsername disabled
HTTPProxyPassword disabled
HTTPUserAgent disabled
NotifyClamd = "/etc/clamav/clamd.conf"
OnUpdateExecute disabled
OnErrorExecute disabled
OnOutdatedExecute disabled
LocalIPAddress disabled
ConnectTimeout = "30"
ReceiveTimeout = "30"
SubmitDetectionStats disabled
DetectionStatsCountry disabled
DetectionStatsHostID disabled
SafeBrowsing disabled
Bytecode = "yes"

clamav-milter.conf not found

Software settings
-----------------
Version: 0.99.2
Optional features supported: MEMPOOL IPv6 FRESHCLAM_DNS_FIX AUTOIT_EA06 BZIP2 LIBXML2 PCRE ICONV JSON

Database information
--------------------
Database directory: /var/lib/clamav
bytecode.cvd: version 319, sigs: 75, built on Wed Dec 6 21:17:11 2017
main.cvd: version 58, sigs: 4566249, built on Wed Jun 7 17:38:10 2017
daily.cvd: version 24138, sigs: 1806393, built on Sun Dec 17 16:10:39 2017
Total number of signatures: 6372717

Platform information
--------------------
uname: Linux 4.4.0-103-generic #126-Ubuntu SMP Mon Dec 4 16:22:09 UTC 2017 ppc64le
OS: linux-gnu, ARCH: ppc, CPU: powerpc64le
Full OS version: Ubuntu 16.04.2 LTS
zlib version: 1.2.8 (1.2.8), compile flags: a9
platform id: 0x0a3152520800000000050400

Build information
-----------------
GNU C: 5.4.0 20160609 (5.4.0)
CPPFLAGS: -Wdate-time -D_FORTIFY_SOURCE=2
CFLAGS: -g -O3 -fstack-protector-strong -Wformat -Werror=format-security -Wall -D_FILE_OFFSET_BITS=64 -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
CXXFLAGS:
LDFLAGS: -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,--as-needed
Configure: '--build=powerpc64le-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libexecdir=/usr/lib/clamav' '--disable-maintainer-mode' '--disable-dependency-tracking' 'CFLAGS=-g -O3 -fstack-protector-strong -Wformat -Werror=format-security -Wall -D_FILE_OFFSET_BITS=64' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O3 -fstack-protector-strong -Wformat -Werror=format-security -Wall -D_FILE_OFFSET_BITS=64' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,--as-needed' '--with-dbdir=/var/lib/clamav' '--sysconfdir=/etc/clamav' '--disable-clamav' '--disable-unrar' '--enable-milter' '--enable-dns-fix' '--with-libjson' '--with-gnu-ld' '--with-systemdsystemunitdir=/lib/systemd/system' 'build_alias=powerpc64le-linux-gnu'
sizeof(void*) = 8
Engine flevel: 82, dconf: 82

2017년 12월 14일 목요일

ppc64le 아키텍처 cluster에서 HPCC 수행하는 방법

HPCC는 간단하게 수퍼컴의 성능을 측정할 수 있는, HPL (High Performance LINPACK)을 포함한 7개 HPC code들의 묶음 suite입니다. 아래가 홈페이지입니다.

http://icl.cs.utk.edu/hpcc/software/index.html

여기 나온 정보만으로는 컴파일해서 돌리는 것이 쉽지 않은데, 아래 site의 HPL 수행 방법을 보면 그나마 좀 이해가 됩니다.

http://www.crc.nd.edu/~rich/CRC_Summer_Scholars_2014/HPL-HowTo.pdf

여기서 돌리는 테스트들의 내용 등은 수학적 지식이 있어야 어느 정도 이해가 됩니다만, 시스템 엔지니어 입장에서는 그런 것 모르고도 대충 돌릴 수는 있습니다. 아래에는 ppc64le 아키텍처, 즉 IBM POWER8 프로세서 환경에서 어떻게 수행하면 되는지를 step by step으로 정리했습니다. 실은 x86 아키텍처가 아닌 ppc64le라고 해서 딱히 수행 방법이 다르지는 않습니다.

여기서는 PDP (Power Development Cloud) 환경의 1-core짜리 ppc64le Ubuntu 16.04 가상머신을 2대 이용했습니다.
(* Power Development Cloud, https://www-356.ibm.com/partnerworld/wps/servlet/ContentHandler/stg_com_sys_power-development-platform 에서 신청하면 무료로 2주간 1-core짜리 Linux on POWER 환경을 제공. 2주 후 다시 또 무료로 재신청 가능. 최대 5개 VM을 한꺼번에 신청 가능)

다음과 같이 openmpi와 BLAS가 기본으로 설치되어 있어야 합니다. apt-get install libopenmpi-dev libblas-dev 명령으로 쉽게 설치됩니다.

u0017649@sys-90393:~/hpcc-1.5.0$ dpkg -l | grep openmpi
ii libopenmpi-dev 1.10.2-8ubuntu1 ppc64el high performance message passing library -- header files
ii libopenmpi1.10 1.10.2-8ubuntu1 ppc64el high performance message passing library -- shared library
ii openmpi-bin 1.10.2-8ubuntu1 ppc64el high performance message passing library -- binaries
ii openmpi-common 1.10.2-8ubuntu1 all high performance message passing library -- common files

u0017649@sys-90393:~/hpcc-1.5.0$ dpkg -l | grep blas
ii libblas-common 3.6.0-2ubuntu2 ppc64el Dependency package for all BLAS implementations
ii libblas-dev 3.6.0-2ubuntu2 ppc64el Basic Linear Algebra Subroutines 3, static library
ii libblas3 3.6.0-2ubuntu2 ppc64el Basic Linear Algebra Reference implementations, shared library

먼저 source를 download 받고, tar를 풉니다.

u0017649@sys-90393:~$ wget http://icl.cs.utk.edu/projectsfiles/hpcc/download/hpcc-1.5.0.tar.gz

u0017649@sys-90393:~$ tar -zxf hpcc-1.5.0.tar.gz
u0017649@sys-90393:~$ cd hpcc-1.5.0

먼저 hpl/setup 디렉토리에 있는 make_generic을 수행하여 Make.UNKNOWN을 생성합니다. 여기서 대략 이 환경에 맞는 값들로 Makefile이 만들어집니다.

u0017649@sys-90393:~/hpcc-1.5.0$ cd hpl/setup

u0017649@sys-90393:~/hpcc-1.5.0/hpl/setup$ sh make_generic

여기서 만들어진 Make.UNKNOWN은 다음과 같은 내용을 담고 있습니다.

u0017649@sys-90393:~/hpcc-1.5.0/hpl/setup$ grep -v \# Make.UNKNOWN
SHELL = /bin/sh
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
ARCH = $(arch)
TOPdir = ../../..
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a
MPdir =
MPinc =
MPlib =
LAdir =
LAinc =
LAlib = -lblas
F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lm
HPL_OPTS =
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CC = mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS)
LINKER = mpif77
LINKFLAGS =
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

이제 이 Make.UNKNOWN를 상위 디렉토리인 hpl 디렉토리에 Make.Linux라는 이름으로 복사합니다.

u0017649@sys-90393:~/hpcc-1.5.0/hpl/setup$ cp Make.UNKNOWN ../Make.Linux

그리고난 뒤 TOPdir, 즉 hpcc-1.5.0으로 올라와서 make arch=Linux를 수행합니다. 약간 헷갈릴 수 있는데, Make.Linux를 복사해둔 hpl 디렉토리가 아니라 그 위의 hpcc-1.5.0 디렉토리에서 make를 수행한다는 점에 유의하십시오. 그러면 아래처럼 mpicc가 수행되면서 7개 HPC code들을 모두 build합니다.

u0017649@sys-90393:~/hpcc-1.5.0/hpl/setup$ cd ../..

u0017649@sys-90393:~/hpcc-1.5.0$ make arch=Linux
...
mpicc -o ../../../../FFT/wrapfftw.o -c ../../../../FFT/wrapfftw.c -I../../../../include -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I../../../include -I../../../include/Linux
mpicc -o ../../../../FFT/wrapmpifftw.o -c ../../../../FFT/wrapmpifftw.c -I../../../../include -DAdd_ -DF77_INTEGER=int -DStringSunStyle -I../../../include -I../../../include/Linux
...
ar: creating ../../../lib/Linux/libhpl.a
echo ../../../lib/Linux/libhpl.a
../../../lib/Linux/libhpl.a
mpif77 -o ../../../../hpcc ../../../lib/Linux/libhpl.a -lblas -lm
make[1]: Leaving directory '/home/u0017649/hpcc-1.5.0/hpl/lib/arch/build'

결과로 hpcc-1.5.0 디렉토리에 hpcc라는 실행 파일이 생성됩니다.

u0017649@sys-90393:~/hpcc-1.5.0$ file hpcc
hpcc: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), dynamically linked, interpreter /opt/at10.0/lib64/ld64.so.2, for GNU/Linux 4.4.0, BuildID[sha1]=b47fb43c4d96819e25da7469049a780f8251458b, not stripped

이 hpcc 파일을 수행하면 7개 HPC code들을 순차적으로 모두 수행하는 것입니다. 이를 위해서 먼저 LD_LIBRARY_PATH를 다음과 같이 설정합니다.

u0017649@sys-90393:~/hpcc-1.5.0$ export LD_LIBRARY_PATH=/usr/lib:/usr/lib/powerpc64le-linux-gnu:$LD_LIBRARY_PATH

그리고 INPUT data file을 만들어야 합니다. 함께 제공되는 _hpccinf.txt를 hpccinf.txt라는 이름으로 복사하여 그대로 사용하셔도 됩니다만, 여기서는 http://www.netlib.org/benchmark/hpl/tuning.html 에 나오는 내용대로 해보겠습니다. INPUT data file은 몇번째 줄에는 무슨 정보가 들어가야 한다는 일정한 format이 정해져 있어서 그대로 입력하셔야 하고, 각 줄의 의미는 앞에서 언급한 tuning.html 을 참조하시면 됩니다. 다만, 여기에 나오는 것처럼 P x Q 정보를 2 x 8 로 하면 총 16개 processor가 있어야 수행을 할 수 있습니다.

u0017649@sys-90393:~/hpcc-1.5.0$ vi hpccinf.txt
HPL Linpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
3 # of problems sizes (N)
3000 6000 10000 Ns
5 # of NBs
80 100 120 140 160 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
2 # of process grids (P x Q)
1 2 Ps
6 8 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
4 # of recursive stopping criterium
1 2 4 8 NBMINs (>= 1)
3 # of panels in recursion
2 3 4 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
60 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)

이대로 수행하면 다음과 같이 최소 16개 process가 필요하다면서 error가 납니다. 제가 수행하는 PDP 환경에는 1-core만 있기 때문입니다.

u0017649@sys-90393:~/hpcc-1.5.0$ ./hpcc
HPL ERROR from process # 0, on line 440 of function HPL_pdinfo:
>>> Need at least 16 processes for these tests <<<

HPL ERROR from process # 0, on line 440 of function HPL_pdinfo:
>>> Need at least 16 processes for these tests <<<

따라서 위의 11~12번째 줄, 즉 P x Q 정보를 아래처럼 1로 바꿔주겠습니다.

1 1 Ps
1 1 Qs

이걸 그대로 수행하면 최소 18시간 이상 계속 돌아가더군요. 그래서 도중에 중단시키고, problem size인 6번째 줄의 값들을 1/10 씩으로 줄이겠습니다.

300 600 1000 Ns

이제 single node에서 돌릴 준비가 끝났습니다. 다음과 같이 hpccinf.txt를 만드셔서 수행하시면 됩니다.

u0017649@sys-90393:~/hpcc-1.5.0$ cat hpccinf.txt
HPL Linpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
3 # of problems sizes (N)
300 600 1000 Ns
5 # of NBs
80 100 120 140 160 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
2 # of process grids (P x Q)
1 1 Ps
1 1 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
4 # of recursive stopping criterium
1 2 4 8 NBMINs (>= 1)
3 # of panels in recursion
2 3 4 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
60 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)

수행 방법은 그냥 hpcc를 수행하는 것 뿐입니다. 위의 input data로 하면 1-core POWER8 환경에서는 약 20분 정도 걸립니다.

u0017649@sys-90393:~/hpcc-1.5.0$ time ./hpcc

그 결과물은 hpccoutf.txt 이라는 이름의 파일에 쌓이는데, 약 1.6MB 정도의 크기로 쌓이고 그 끝부분 내용은 아래와 같습니다.

...
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11R3R8 1000 160 1 1 0.08 8.655e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0055214 ...... PASSED
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11R4R8 1000 160 1 1 0.07 8.982e+00
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0059963 ...... PASSED
================================================================================

Finished 3240 tests with the following results:
3240 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
Current time (1513215290) is Wed Dec 13 20:34:50 2017

End of HPL section.
Begin of Summary section.
VersionMajor=1
VersionMinor=5
VersionMicro=0
VersionRelease=f
LANG=C
Success=1
sizeof_char=1
sizeof_short=2
sizeof_int=4
sizeof_long=8
sizeof_void_ptr=8
sizeof_size_t=8
sizeof_float=4
sizeof_double=8
sizeof_s64Int=8
sizeof_u64Int=8
sizeof_struct_double_double=16
CommWorldProcs=1
MPI_Wtick=1.000000e-06
HPL_Tflops=0.00934813
HPL_time=0.071476
HPL_eps=1.11022e-16
HPL_RnormI=1.71035e-12
HPL_Anorm1=263.865
HPL_AnormI=262.773
HPL_Xnorm1=2619.63
HPL_XnormI=11.3513
HPL_BnormI=0.499776
HPL_N=1000
HPL_NB=80
HPL_nprow=1
HPL_npcol=1
HPL_depth=1
HPL_nbdiv=4
HPL_nbmin=4
HPL_cpfact=R
HPL_crfact=R
HPL_ctop=1
HPL_order=R
HPL_dMACH_EPS=1.110223e-16
HPL_dMACH_SFMIN=2.225074e-308
HPL_dMACH_BASE=2.000000e+00
HPL_dMACH_PREC=2.220446e-16
HPL_dMACH_MLEN=5.300000e+01
HPL_dMACH_RND=1.000000e+00
HPL_dMACH_EMIN=-1.021000e+03
HPL_dMACH_RMIN=2.225074e-308
HPL_dMACH_EMAX=1.024000e+03
HPL_dMACH_RMAX=1.797693e+308
HPL_sMACH_EPS=5.960464e-08
HPL_sMACH_SFMIN=1.175494e-38
HPL_sMACH_BASE=2.000000e+00
HPL_sMACH_PREC=1.192093e-07
HPL_sMACH_MLEN=2.400000e+01
HPL_sMACH_RND=1.000000e+00
HPL_sMACH_EMIN=-1.250000e+02
HPL_sMACH_RMIN=1.175494e-38
HPL_sMACH_EMAX=1.280000e+02
HPL_sMACH_RMAX=3.402823e+38
dweps=1.110223e-16
sweps=5.960464e-08
HPLMaxProcs=1
HPLMinProcs=1
DGEMM_N=576
StarDGEMM_Gflops=12.5581
SingleDGEMM_Gflops=11.5206
PTRANS_GBs=0.555537
PTRANS_time=0.000324011
PTRANS_residual=0
PTRANS_n=150
PTRANS_nb=120
PTRANS_nprow=1
PTRANS_npcol=1
MPIRandomAccess_LCG_N=524288
MPIRandomAccess_LCG_time=0.719245
MPIRandomAccess_LCG_CheckTime=0.076761
MPIRandomAccess_LCG_Errors=0
MPIRandomAccess_LCG_ErrorsFraction=0
MPIRandomAccess_LCG_ExeUpdates=2097152
MPIRandomAccess_LCG_GUPs=0.00291577
MPIRandomAccess_LCG_TimeBound=60
MPIRandomAccess_LCG_Algorithm=0
MPIRandomAccess_N=524288
MPIRandomAccess_time=0.751529
MPIRandomAccess_CheckTime=0.0741832
MPIRandomAccess_Errors=0
MPIRandomAccess_ErrorsFraction=0
MPIRandomAccess_ExeUpdates=2097152
MPIRandomAccess_GUPs=0.00279051
MPIRandomAccess_TimeBound=60
MPIRandomAccess_Algorithm=0
RandomAccess_LCG_N=524288
StarRandomAccess_LCG_GUPs=0.0547931
SingleRandomAccess_LCG_GUPs=0.0547702
RandomAccess_N=524288
StarRandomAccess_GUPs=0.0442224
SingleRandomAccess_GUPs=0.044326
STREAM_VectorSize=333333
STREAM_Threads=1
StarSTREAM_Copy=2.13593
StarSTREAM_Scale=2.09571
StarSTREAM_Add=3.23884
StarSTREAM_Triad=3.26659
SingleSTREAM_Copy=2.13593
SingleSTREAM_Scale=2.11213
SingleSTREAM_Add=3.23884
SingleSTREAM_Triad=3.26659
FFT_N=131072
StarFFT_Gflops=0.488238
SingleFFT_Gflops=0.488024
MPIFFT_N=65536
MPIFFT_Gflops=0.305709
MPIFFT_maxErr=1.23075e-15
MPIFFT_Procs=1
MaxPingPongLatency_usec=-1
RandomlyOrderedRingLatency_usec=-1
MinPingPongBandwidth_GBytes=-1
NaturallyOrderedRingBandwidth_GBytes=-1
RandomlyOrderedRingBandwidth_GBytes=-1
MinPingPongLatency_usec=-1
AvgPingPongLatency_usec=-1
MaxPingPongBandwidth_GBytes=-1
AvgPingPongBandwidth_GBytes=-1
NaturallyOrderedRingLatency_usec=-1
FFTEnblk=16
FFTEnp=8
FFTEl2size=1048576
M_OPENMP=-1
omp_get_num_threads=0
omp_get_max_threads=0
omp_get_num_procs=0
MemProc=-1
MemSpec=-1
MemVal=-1
MPIFFT_time0=0
MPIFFT_time1=0.00124693
MPIFFT_time2=0.00407505
MPIFFT_time3=0.000625134
MPIFFT_time4=0.0091598
MPIFFT_time5=0.00154018
MPIFFT_time6=0
CPS_HPCC_FFT_235=0
CPS_HPCC_FFTW_ESTIMATE=0
CPS_HPCC_MEMALLCTR=0
CPS_HPL_USE_GETPROCESSTIMES=0
CPS_RA_SANDIA_NOPT=0
CPS_RA_SANDIA_OPT2=0
CPS_USING_FFTW=0
End of Summary section.
########################################################################
End of HPC Challenge tests.
Current time (1513215290) is Wed Dec 13 20:34:50 2017

########################################################################

1184.15user 2.77system 19:47.13elapsed 99%CPU (0avgtext+0avgdata 34624maxresident)k

이제 multi-node로 수행하는 방법을 보겠습니다. 이 역시 매우 간단합니다. 먼저 다음과 같이 node 이름을 담은 파일을 만듭니다. 만약 노드들에 network interface가 여러개라면 그 중 10GbE 또는 Infiniband처럼 고속인 것을 적어주는 것이 좋습니다.

u0017649@sys-90393:~/hpcc-1.5.0$ cat nodes.rf
sys-90393
sys-90505

이어서 INPUT data 파일인 hpccinf.txt를 조금 수정해줍니다. 위에서 사용한 것과는 달리, 일단 2대를 사용하니까 2 process로 돌아야 합니다. 따라서 11~12번째 줄, 즉 P x Q 정보를 아래처럼 1 1, 그리고 1 2로 바꿔주겠습니다. 또, 이대로 돌리니 MPI overhead가 있어서인지 20분이 아니라 40분이 되도록 끝나질 않더군요. 그래서 6번째 줄의 problem size도 다시 1/10로 더 줄였습니다.

u0017649@sys-90393:~/hpcc-1.5.0$ cat hpccinf.txt
HPL Linpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
3 # of problems sizes (N)
30 60 100 Ns
5 # of NBs
80 100 120 140 160 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
2 # of process grids (P x Q)
1 1 Ps
1 2 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
4 # of recursive stopping criterium
1 2 4 8 NBMINs (>= 1)
3 # of panels in recursion
2 3 4 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
60 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)

그러고 난 다음에는 다음과 같이 mpirun으로 hpcc를 수행해주면 됩니다.

u0017649@sys-90393:~/hpcc-1.5.0$ mpirun -x PATH -x LD_LIBRARY_PATH -np 2 -hostfile nodes.rf ./hpcc | tee HPCC.out

시작과 동시에 양쪽 node에서 CPU를 100% 쓰면서 돌아가는 것을 확인하실 수 있습니다.