data:image/s3,"s3://crabby-images/15d10/15d10c014c67bcc0ba71d484abf23c9b16f140e1" alt="RTX 2080 Ti"
My colleague’s bare metal PC with three-fans-RTX-2080-Ti
My previous colleague Jian Mei has bought a new GPU – RTX 2080 Ti for training bird images of http://dongniao.net/. After he installed all the power supply and GPU on his computer, I began to run my MobileNetV2 model on it. Unsurprisingly, the performance doesn’t boost significantly: training speed increase from 60 samples/sec to about 150 samples/sec.
The most possible reason for the poor performance is the TensorCore.
data:image/s3,"s3://crabby-images/24df2/24df21a31a71756c8d474ddc03029e5f9a60413d" alt="TensorCore"
To use the full power of TensorCore, or my colleague’s RTX 2080 TI, only following the guide <Mixed Precision Training> is not enough. I directly used the complete code example from Nvidia’s Github.
By using the ResNeXt50 model from the example, the TensorCore do promote the performance of training:
Float32 | Float16 (TensorCore) | |
Performance(samples/sec) | 40 | 79 |
In the document of Nvidia, it reports 20 times performance enhancement by TensorCore. But in our RTX 2080 Ti, it only gained 2 times performance. Actually, the mainstream neural network models, such as ResNet/Densenet/Nasnet, couldn’t use up a highend GPU of Nvidia, since its too strong coumputation power for floating point. To produce the best results of TensorCore, I need to try more complicated and dense model continously.