To check abnormal loss value when training a new model

Yesterday I wrote a Tensorflow program to train CIFAR100 dataset with Resnet-50 model. But when the training begin, I saw the ‘loss’ of classification is abnormally big and didn’t reduce at all:

loss[2.6032338e+25]
loss[2.5617402e+25]
loss[3.3851871e+25]
loss[3.092054e+25]
...

Firstly, I thought the code for processing dataset may be wrong. But after print out the data in console, the loading input data seems all right. Then I print all the value of tensors right after initialization of model. And these value seems correct either.
Without other choices, I began to check the initializer in Tensorflow code:

    with slim.arg_scope([slim.conv2d],
                        weights_initializer = tf.truncated_normal_initializer(mean = 0, stddev = 0.1)):
      img_inf, _ = resnet_v2.resnet_v2_50(image, NUM_CLASSES)

If the loss is too big, maybe I could decrease the initial value of tensors in model? Then I change ‘mean’ from ‘0’ to ‘0.1’ for ‘slim.conv2d’:

                        weights_initializer = tf.truncated_normal_initializer(mean = 0.001, stddev = 1)):

But the loss seems more crazy:

loss[1.1468245e+29]
loss[1.6610325e+29]
loss[1.1840615e+29]
...

I have to change ‘mean’ and ‘stddev’ again:

                        weights_initializer = tf.truncated_normal_initializer(mean = 0.001, stddev = 1)):

This time, the loss seems correct now.

loss[1215.8724]
loss[1023.67676]
loss[583.6274]
...

This is the first time I saw that initialized value could make the training accuracy so different.

Robin on Linux

To check abnormal loss value when training a new model

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply