Yesterday I wrote a Tensorflow program to train CIFAR100 dataset with Resnet-50 model. But when the training begin, I saw the ‘loss’ of classification is abnormally big and didn’t reduce at all:
loss[2.6032338e+25] loss[2.5617402e+25] loss[3.3851871e+25] loss[3.092054e+25] ...
Firstly, I thought the code for processing dataset may be wrong. But after print out the data in console, the loading input data seems all right. Then I print all the value of tensors right after initialization of model. And these value seems correct either.
Without other choices, I began to check the initializer in Tensorflow code:
with slim.arg_scope([slim.conv2d],
weights_initializer = tf.truncated_normal_initializer(mean = 0, stddev = 0.1)):
img_inf, _ = resnet_v2.resnet_v2_50(image, NUM_CLASSES)
If the loss is too big, maybe I could decrease the initial value of tensors in model? Then I change ‘mean’ from ‘0’ to ‘0.1’ for ‘slim.conv2d’:
weights_initializer = tf.truncated_normal_initializer(mean = 0.001, stddev = 1)):
But the loss seems more crazy:
loss[1.1468245e+29] loss[1.6610325e+29] loss[1.1840615e+29] ...
I have to change ‘mean’ and ‘stddev’ again:
weights_initializer = tf.truncated_normal_initializer(mean = 0.001, stddev = 1)):
This time, the loss seems correct now.
loss[1215.8724] loss[1023.67676] loss[583.6274] ...
This is the first time I saw that initialized value could make the training accuracy so different.