After reading this paper, I begin to do an experiment about it. Referencing this snippet, I wrote my code:
net1 = model_builder.build_model() net2 = model_builder.build_model() output = model_builder.build_model() net1.load_state_dict(torch.load(args.model1, map_location="cpu")) net2.load_state_dict(torch.load(args.model2, map_location="cpu")) # Average sd1 = net1.named_parameters() sd2 = net2.named_parameters() sdo = dict(sd2) for name, param in sd1: sdo[name].data.copy_(0.5*param.data + 0.5*sdo[name].data) output.load_state_dict(sdo) torch.save(output, args.output) # here is a test output.load_state_dict(torch.load(args.output))
But after generating the average-weights new model, the PyTorch failed to load it:
Traceback (most recent call last): File "average_models.py", line 43, in <module> output.load_state_dict(torch.load(args.output)) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1534, in load_state_dict state_dict = state_dict.copy() File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'RegNet' object has no attribute 'copy'
The reason for failure is quite simple: we only need to save the state_dict of the model instead of all information (since I am using FP16 format ). Therefore the correct code should be:
net1 = model_builder.build_model() net2 = model_builder.build_model() net1.load_state_dict(torch.load(args.model1, map_location="cpu")) net2.load_state_dict(torch.load(args.model2, map_location="cpu")) # Average sd1 = net1.named_parameters() sd2 = net2.named_parameters() sdo = dict(sd2) for name, param in sd1: sdo[name].data.copy_(0.5*param.data + 0.5*sdo[name].data) torch.save(sdo, args.output)
BTW, the averaging of my models doesn’t rise accuracy as the paper suggests in my experiment.