In this August, we have got 0.83 evaluation accuracy for DIB-10K dataset. But since last month, we have updated the dataset and the accuracy could only get to 0.82.

The first doubtful point is the Weight Standardization method we used for micro-batch (since the model is too big). So I turned to try gradient-accumulation and use this snippet as an example because it won’t need me to change my code heavily:

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       
    loss = loss / accumulation_steps                
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             
        optimizer.step()                            
        model.zero_grad()                           
        if (i+1) % evaluation_steps == 0:           
            evaluate_model()                        

But after changing my code and retrain the model, the accuracy still keep around 0.82:

Epoch     4: reducing learning rate of group 0 to 1.0000e-01.
[2020-12-16 05:53:29] Eval accuracy: 0.8283 | Train accuracy: 0.8187
[2020-12-16 10:01:40] Eval accuracy: 0.8284 | Train accuracy: 0.8938
[2020-12-16 14:11:35] Eval accuracy: 0.8284 | Train accuracy: 0.8313
Epoch     7: reducing learning rate of group 0 to 5.0000e-02.
[2020-12-16 18:21:47] Eval accuracy: 0.8285 | Train accuracy: 0.8750
[2020-12-16 22:31:19] Eval accuracy: 0.8285 | Train accuracy: 0.8313
[2020-12-17 02:41:37] Eval accuracy: 0.8284 | Train accuracy: 0.8625
Epoch    10: reducing learning rate of group 0 to 2.5000e-02.
[2020-12-17 06:52:05] Eval accuracy: 0.8286 | Train accuracy: 0.8500
[2020-12-17 11:02:11] Eval accuracy: 0.8285 | Train accuracy: 0.8063
[2020-12-17 15:12:23] Eval accuracy: 0.8286 | Train accuracy: 0.8375
Epoch    13: reducing learning rate of group 0 to 1.2500e-02.
[2020-12-17 19:22:04] Eval accuracy: 0.8285 | Train accuracy: 0.8313

This makes me really desperate. Maybe I should temporarily put this task aside and go on other works.