I am using Nvidia’s official docker image of PyTorch for my model training for quite a long time. It works very well but the only problem is that the image is too large: more than 6GB. In my poor home network, it would cost a painfully long time to download.
Yesterday, an interesting idea jumped out to my mind: why not build my own small docker image to use PyTorch? Then I started to do it.
Firstly, I chose ClearLinux from Intel since it is very clean and used all state-of-art software (which means its performance is fabulous).
I used distrobox to create my environment:
distrobox create --image clearlinux:latest \ --name robin_clear \ --home /home/robin/clearlinux \ --additional-flags "--shm-size=4g" \ --additional-flags "--gpus all" \ --additional-flags "--device=/dev/nvidiactl" \ --additional-flags "--device=/dev/nvidia0"
Enter the environment:
distrobox enter robin_clear
run file and install it in the
sudo swupd bundle-add libxml2 sudo ./cuda_11.3.0_465.19.01_linux.run \ --toolkit \ --no-man-page \ --override \ --silent
Then, the important part: install gcc-10 (ClearLinux included gcc-12, which is too high for CUDA-11.03) and create the symbol links for it
sudo swupd bundle-add c-extras-gcc10 sudo ln -s /usr/bin/gcc-10 /usr/local/cuda/bin/gcc sudo ln -s /usr/bin/g++-10 /usr/local/cuda/bin/g++
Install the PyTorch:
sudo swupd bundle-add python3-basic pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
Install the apex for mixed-precision training (because my model training is using it):
git clone https://github.com/NVIDIA/apex cd apex pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Now I can run my training in the ClearLinux. The comparison of these two docker images is here:
|CUDA Version||PyTorch Version||Docker image size||VRAM Usage||Time for training one batch|
|Nvidia Official PyTorch Image||11.7||1.12.0||14.7 GB||10620MB||0.2745 seconds|
|My ClearLinux Image||11.3||1.11.0||12.8 GB||10936MB||0.3066 seconds|
|My ClearLinux Image(v2)||11.3||1.12.0||12.8 GB||10964MB||0.2812 seconds|
|My ClearLinux Image(build PyTorch myself)||11.7||1.13.0||12.8 GB||10658MB||0.2716 seconds|
Looks like Nvidia works better than me 🙂 The only chance to win it is by using the newest CUDA and building state-of-art PyTorch manually.