In previous article, I have found out the reason. But how to resolve it on Multi-GPU-Training is still a question. As the suggestion of this issue in github, I tried two way to fix the problem:
First, rewrite my Averaging-Gradients-Training to learn tf.slim.create_train_op():
...
def create_train_grads(total_loss, optimizer):
update_ops = set(ops.get_collection(ops.GraphKeys.UPDATE_OPS))
with ops.control_dependencies(update_ops):
barrier = control_flow_ops.no_op(name='update_barrier')
total_loss = control_flow_ops.with_dependencies([barrier], total_loss)
variables_to_train = tf_variables.trainable_variables()
grads = optimizer.compute_gradients(total_loss, variables_to_train)
return grads
...
cross_entropy = tf.reduce_mean(cross_entropy)
tf.get_variable_scope().reuse_variables()
grads = create_train_grads(cross_entropy, opt)
tower_grads.append(grads)
...
grads = average_gradients(tower_grads)
grad_updates = opt.apply_gradients(grads)
with ops.name_scope('train_op'):
# Ensure the train_tensor computes grad_updates.
train_op = control_flow_ops.with_dependencies([grad_updates], cross_entropy)
# Add the operation used for training to the 'train_op' collection
train_ops = ops.get_collection_ref(ops.GraphKeys.TRAIN_OP)
if train_op not in train_ops:
train_ops.append(train_op)
But unfortunately, this didn’t work at all. The inference result was still a mess.
Then, another way, I use Asynchronous-Gradient-Training and tf.slim.create_train_op():
...
cross_entropy = tf.reduce_mean(cross_entropy)
train_op = tf.contrib.slim.learning.create_train_op(cross_entropy, opt)
tower_ops.append(train_op)
...
train_step = tf.group(*tower_ops)
Now the inference works very well! And the training speed become a little bit faster than Averaging-Gradients-Training, for the Averaging Operation needs to wait multi gradients from multi GPUs.