전의 posting에서는 DDL (distributed deep learning) option을 이용한 caffe-ibm 사용법에 대해 적었습니다. 이번에는 ddl-tensorflow에 대한 내용입니다.
Caffe와는 달리 tensorflow는 python으로 app code를 짜야 하는데, DDL을 이용한 python code 작성을 위한 example code도 일부 제공됩니다. 가장 간단한 MNIST training을 위한 python code가 ddl-tensorflow에 포함되어 있습니다.
먼저, PowerAI toolkit을 설치합니다. (여기서는 최신 v5가 아니라 기존 v4를 썼습니다.)
u0017649@sys-92312:~$ dpkg -l | grep mldl
ii mldl-repo-local 4.0.0 ppc64el IBM repository for Deep Learning tools for POWER linux
PowerAI에서 deb 형태로 제공되는 tensorflow와 ddl-tensorflow를 확인하고, apt-get 명령으로 설치합니다.
u0017649@sys-92312:~$ apt-cache pkgnames | grep tensor
ddl-tensorflow
tensorflow
u0017649@sys-92312:~$ sudo apt-get install tensorflow ddl-tensorflow
관련 example code는 아래 directory에 있습니다. mnist와 slim 관련 2가지가 있습니다.
u0017649@sys-92312:~$ cd /opt/DL/ddl-tensorflow/examples
u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples$ ls -ltr
total 8
drwxr-xr-x 7 root root 4096 Mar 15 08:18 slim
drwxr-xr-x 2 root root 4096 Mar 15 08:18 mnist
u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples$ cd mnist
u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ ls -ltr
total 16
-rw-r--r-- 1 root root 240 Aug 2 2017 README.md
-rw-r--r-- 1 root root 8681 Aug 2 2017 ddl_mnist.py
MNIST를 위한 README.md를 읽어보면, 그냥 이 code를 mpirun을 이용하여 어떻게 돌리느냐에 대한 사용방법 안내입니다. 여기서는 single-node에 GPU 2장이 설치된 경우이므로 -rf 옵션을 통해 별도의 rank file(rf)을 지정할 필요는 없습니다. OpenMPI 특성상, 모든 GPU는 독립적인 learner로 처리되므로, 한대의 서버에 장착된 GPU 2장이나, 두대의 서버에 각각 1장씩 장착된 GPU 총 2장이나 topology가 다를 뿐 동일한 방식으로 처리됩니다.
u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ vi README.md
# HOW TO RUN
To run the IBM PowerAI Distributed Deep Learning MNIST training example:
$ source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate
$ mpirun -x PATH -x LD_LIBRARY_PATH -x PYTHONPATH -n 2 python ddl_mnist.py
아래는 ddl_mnist.py 전체 내용입니다.
u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ vi ddl_mnist.py
'''
Based on https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py:A Convolutional Network implementation example using TensorFlow library.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/)
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
Modifications:
*****************************************************************
Licensed Materials - Property of IBM
(C) Copyright IBM Corp. 2017. All Rights Reserved.
US Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
*****************************************************************
'''
import tensorflow as tf
import numpy as np
############################################################################
# IBM PowerAI Distributed Deep Learning (DDL) setup
############################################################################
# Disable GPU memory preallocation
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
############################################################################
# DDL Initialize BEGIN
############################################################################
# Load DDL operator
ddl = tf.load_op_library('/opt/DL/ddl-tensorflow/lib/ddl_MDR.so')
# DDL initializes MPI on CPU
# ddl.init takes two inputs
# 1) the number of GPUs to utilize on each host in training.
# this number is not the number of GPUs to use for each leaner. It simply tells DDL that there are X GPUs in each host to be used for training
# 2) DDL options (refer to README for details)
with tf.Session(config=config) as sess:
with tf.device('/cpu:0'):
rank, size, gpuid = sess.run(ddl.init(2, mode = '-mode r:2 -dump_iter 100'))
# MPI info and assigned GPU
print [rank, size, gpuid]
############################################################################
# DDL Initialize END
############################################################################
# Perform all TensorFlow computation within gpuid
with tf.device('/gpu:%d' %gpuid):
##############################################################################
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 100
display_step = 1
# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units
# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
# Reshape input picture
x = tf.reshape(x, shape=[-1, 28, 28, 1])
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
return out
# Store layers weight & bias
weights = {
############################################################################
# DDL BROADCAST BEGIN
############################################################################
# This step ensures that all learners start with the same initial parameters
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(ddl.bcast(tf.random_normal([5, 5, 1, 32]))),
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(ddl.bcast(tf.random_normal([5, 5, 32, 64]))),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(ddl.bcast(tf.random_normal([7*7*64, 1024]))),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(ddl.bcast(tf.random_normal([1024, n_classes])))
############################################################################
# DDL BROADCAST END
############################################################################
}
biases = {
'bc1': tf.Variable(ddl.bcast(tf.random_normal([32]))),
'bc2': tf.Variable(ddl.bcast(tf.random_normal([64]))),
'bd1': tf.Variable(ddl.bcast(tf.random_normal([1024]))),
'out': tf.Variable(ddl.bcast(tf.random_normal([n_classes])))
}
# Construct model
pred = conv_net(x, weights, biases, keep_prob)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
############################################################################
# DDL ALLREDUCE BEGIN
############################################################################
# Collect the gradients and the corresponding parameters w.r.t the given cost
grads_and_vars = optimizer.compute_gradients(cost)
# Separate out the tuple
grads, vars = zip(*grads_and_vars)
# This step takes the average of the gradients on all the learners
grads_and_vars_ddl = zip(ddl.all_reduce_n(grads, op='avg'), vars)
# Update the parameters with the averaged gradient
objective = optimizer.apply_gradients(grads_and_vars_ddl)
############################################################################
# DDL ALLREDUCE END
############################################################################
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
##############################################################################
def split(a, n):
k, m = divmod(len(a), n)
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
# Launch the graph
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
# Each learner will read batch_size*size samples and
# use only the portion correspoding to the current learner (or rank)
batch_x, batch_y = mnist.train.next_batch(batch_size*size)
batch_x = np.split(batch_x,size)[rank]
batch_y = np.split(batch_y,size)[rank]
# Run optimization op (backprop)
sess.run(objective, feed_dict={x: batch_x, y: batch_y,
keep_prob: dropout})
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,
keep_prob: 1.})
print("MPI "+str(rank)+"] Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
print("MPI "+str(rank)+"] Optimization Finished!")
# Calculate accuracy for 256 mnist test images
print("MPI "+str(rank)+"] Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
y: mnist.test.labels[:256],
keep_prob: 1.}))
위에서 언급된 또다른 example인 slim을 포함한 example code들 directory의 tar 파일을 아래 link에 올려두었습니다.
위 링크에 올린 파일 내용은 아래의 ddl-examples.tgz이며, 그 속에 들어있는 파일은 아래와 같습니다.
u0017649@sys-92312:/opt/DL/ddl-tensorflow$ sudo tar -zcvf ddl-examples.tgz doc examples
doc/
doc/README-API.md
doc/README.md
doc/LICENSE.pdf
doc/images/
doc/images/clones2.png
doc/images/cifar10_overview.png
examples/
examples/slim/
examples/slim/WORKSPACE
examples/slim/__init__.py
examples/slim/nets/
examples/slim/nets/__init__.py
examples/slim/nets/resnet_v1_test.py
examples/slim/nets/nets_factory_test.py
examples/slim/nets/alexnet.py
examples/slim/nets/inception_utils.py
examples/slim/nets/vgg.py
examples/slim/nets/mobilenet_v1.png
examples/slim/nets/vgg_test.py
examples/slim/nets/inception_v4_test.py
examples/slim/nets/resnet_utils.py
examples/slim/nets/inception_v2.py
examples/slim/nets/nets_factory.py
examples/slim/nets/mobilenet_v1.py
examples/slim/nets/inception_v1.py
examples/slim/nets/inception_resnet_v2.py
examples/slim/nets/inception_v2_test.py
examples/slim/nets/inception_v1_test.py
examples/slim/nets/resnet_v2.py
examples/slim/nets/alexnet_test.py
examples/slim/nets/inception_v4.py
examples/slim/nets/inception_v3.py
examples/slim/nets/inception_resnet_v2_test.py
examples/slim/nets/inception_v3_test.py
examples/slim/nets/resnet_v1.py
examples/slim/nets/inception.py
examples/slim/nets/mobilenet_v1_test.py
examples/slim/nets/overfeat.py
examples/slim/nets/overfeat_test.py
examples/slim/nets/cifarnet.py
examples/slim/nets/resnet_v2_test.py
examples/slim/nets/lenet.py
examples/slim/nets/mobilenet_v1.md
examples/slim/train-inception_v3.sh
examples/slim/download_and_convert_data.py
examples/slim/train-cifar10.sh
examples/slim/preprocessing/
examples/slim/preprocessing/__init__.py
examples/slim/preprocessing/preprocessing_factory.py
examples/slim/preprocessing/lenet_preprocessing.py
examples/slim/preprocessing/cifarnet_preprocessing.py
examples/slim/preprocessing/inception_preprocessing.py
examples/slim/preprocessing/vgg_preprocessing.py
examples/slim/README.md
examples/slim/eval_image_classifier.py
examples/slim/scripts/
examples/slim/scripts/train_lenet_on_mnist.sh
examples/slim/scripts/finetune_resnet_v1_50_on_flowers.sh
examples/slim/scripts/finetune_inception_v1_on_flowers.sh
examples/slim/scripts/finetune_inception_resnet_v2_on_flowers.sh
examples/slim/scripts/finetune_inception_v3_on_flowers.sh
examples/slim/scripts/train_cifarnet_on_cifar10.sh
examples/slim/export_inference_graph_test.py
examples/slim/slim_walkthrough.ipynb
examples/slim/deployment/
examples/slim/deployment/__init__.py
examples/slim/deployment/model_deploy_test.py
examples/slim/deployment/model_deploy.py
examples/slim/train-alexnet.sh
examples/slim/BUILD
examples/slim/datasets/
examples/slim/datasets/__init__.py
examples/slim/datasets/cifar10.py
examples/slim/datasets/dataset_utils.py
examples/slim/datasets/download_and_convert_flowers.py
examples/slim/datasets/download_and_convert_cifar10.py
examples/slim/datasets/dataset_factory.py
examples/slim/datasets/imagenet.py
examples/slim/datasets/mnist.py
examples/slim/datasets/flowers.py
examples/slim/datasets/download_and_convert_mnist.py
examples/slim/setup.py
examples/slim/train_image_classifier.py
examples/slim/export_inference_graph.py
examples/mnist/
examples/mnist/README.md
examples/mnist/ddl_mnist.py
댓글 없음:
댓글 쓰기