I am using Nvidia’s official docker image of PyTorch for my model training for quite a long time. It works very well but the only problem is that the image is too large: more than 6GB. In my poor home network, it would cost a painfully long time to download.
Yesterday, an interesting idea jumped out to my mind: why not build my own small docker image to use PyTorch? Then I started to do it.
Firstly, I chose ClearLinux from Intel since it is very clean and used all state-of-art software (which means its performance is fabulous).
I used distrobox to create my environment:
distrobox create --image clearlinux:latest \ --name robin_clear \ --home /home/robin/clearlinux \ --additional-flags "--shm-size=4g" \ --additional-flags "--gpus all" \ --additional-flags "--device=/dev/nvidiactl" \ --additional-flags "--device=/dev/nvidia0"
Enter the environment:
distrobox enter robin_clear
Download CUDA-11.03 run file
and install it in the robin_clear
:
sudo swupd bundle-add libxml2 sudo ./cuda_11.3.0_465.19.01_linux.run \ --toolkit \ --no-man-page \ --override \ --silent
Then, the important part: install gcc-10 (ClearLinux included gcc-12, which is too high for CUDA-11.03) and create the symbol links for it
sudo swupd bundle-add c-extras-gcc10 sudo ln -s /usr/bin/gcc-10 /usr/local/cuda/bin/gcc sudo ln -s /usr/bin/g++-10 /usr/local/cuda/bin/g++
Install the PyTorch:
sudo swupd bundle-add python3-basic pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
Install the apex for mixed-precision training (because my model training is using it):
git clone https://github.com/NVIDIA/apex cd apex pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Now I can run my training in the ClearLinux. The comparison of these two docker images is here:
CUDA Version | PyTorch Version | Docker image size | VRAM Usage | Time for training one batch | |
Nvidia Official PyTorch Image | 11.7 | 1.12.0 | 14.7 GB | 10620MB | 0.2745 seconds |
My ClearLinux Image | 11.3 | 1.11.0 | 12.8 GB | 10936MB | 0.3066 seconds |
My ClearLinux Image(v2) | 11.3 | 1.12.0 | 12.8 GB | 10964MB | 0.2812 seconds |
My ClearLinux Image(build PyTorch myself) | 11.7 | 1.13.0 | 12.8 GB | 10658MB | 0.2716 seconds |
Looks like Nvidia works better than me 🙂 The only chance to win it is by using the newest CUDA and building state-of-art PyTorch manually.