HW 엔지니어를 위한 Deep Learning: RNN PTB benchmark 수행방법

2018년 5월 10일 목요일

RNN PTB benchmark 수행방법

CNN은 Image classification이나 object detection 같은 정적인 image 처리에 많이 사용됩니다. 그러나 기계 번역(machine translation)이나 동영상 captioning 등을 deep learning으로 처리할 때는 시계열(time-series) 분석 등을 통해 미래를 예측하는 것이 필요합니다. 여기에는 CNN 대신 LSTM과 같은 RNN을 사용합니다.

문제는 CNN과는 달리, RNN/LSTM은 그 본질상 data history를 참조해야 하므로 메모리 사용량이 많다는 점입니다. 당연히 시스템 대역폭이 전체 시스템 성능에 영향을 끼치게 됩니다.

RNN 관련 가장 일반적인 벤치마크는 tensorflow models에 포함되어 있는 language modeling이며, 이는 영어 단어 모음인 PTB dataset을 이용합니다. 이것을 이용하여 적절한 성능 벤치마크를 해볼 수 있습니다. 먼저, python3에 tensorflow 1.5.1을 설치한 환경을 준비합니다.

[u0017649@sys-93214 ~]$ git clone https://github.com/tensorflow/models.git

[u0017649@sys-93214 ~]$ cd models/tutorials/rnn/ptb

이 벤치마크에서 사용하는 PTB dataset은 아래와 같이 download 받을 수 있습니다.

[u0017649@sys-93214 ptb]$ wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz

[u0017649@sys-93215 ptb]$ tar -zxvf simple-examples.tgz

이제 다음과 같이 ptb_word_lm.py를 수행하면 됩니다.

[u0017649@sys-93214 ptb]$ time python ptb_word_lm.py --data_path=./simple-examples/data/ --model=large
...
Epoch: 1 Learning rate: 1.000
0.008 perplexity: 25072.184 speed: 1565 wps
0.107 perplexity: 1574.659 speed: 2033 wps
0.206 perplexity: 974.553 speed: 2057 wps
0.306 perplexity: 754.209 speed: 2065 wps
0.405 perplexity: 643.568 speed: 2069 wps
...
0.704 perplexity: 133.906 speed: 2085 wps
0.803 perplexity: 133.743 speed: 2085 wps
0.903 perplexity: 132.101 speed: 2085 wps
Epoch: 10 Train Perplexity: 131.618
Epoch: 10 Valid Perplexity: 117.277
Test Perplexity: 113.380
...

다만 이를 그대로 수행하면 무려 55 epochs를 수행하므로 (P100 4장으로 해도 약 3시간 정도), 좀 짧게 수행하시려면 아래와 같이 max_epoch과 max_max_epoch을 수정하시면 됩니다. 또 좀더 많은 hidden parameter를 사용하면 per word perplexity를 더 줄일 수 있는데, 대신 시간도 더 많이 걸리고 더 많은 메모리를 사용하게 됩니다.

[u0017649@sys-93214 ptb]$
...
class LargeConfig(object):
"""Large config."""
init_scale = 0.04
learning_rate = 1.0
max_grad_norm = 10
num_layers = 2
num_steps = 2
hidden_size = 1000
max_epoch = 4 #원래 14
max_max_epoch = 10 #원래 55
keep_prob = 0.35
lr_decay = 1 / 1.15
batch_size = 20
vocab_size = 10000
rnn_mode = BLOCK
...

저는 --num_gpus=0 옵션을 쓰서 GPU가 없는 CPU 환경에서 수행했는데, 이때 위의 python program이 차지하는 real memory 사용량(ps aux에서 봤을 때의 Res Set 항목)을 보면 RNN이 정말 메모리를 dynamic하게 늘였다줄였다를 반복하는 것을 보실 수 있습니다. 아래는 10분 동안만 2초 간격으로 그 메모리 사용량을 모니터링한 결과입니다. 계속 저 패턴이 반복됩니다.

HW 엔지니어를 위한 Deep Learning

2018년 5월 10일 목요일

RNN PTB benchmark 수행방법

댓글 없음:

댓글 쓰기