I am using Nvidia’s official docker image of PyTorch for my model training for quite a long time. It works very well but the only problem is that the image is too large: more than 6GB. In my poor home network, it would cost a painfully long time to download.

Yesterday, an interesting idea jumped out to my mind: why not build my own small docker image to use PyTorch? Then I started to do it.

Firstly, I chose ClearLinux from Intel since it is very clean and used all state-of-art software (which means its performance is fabulous).

I used distrobox to create my environment:

distrobox create --image clearlinux:latest \
    --name robin_clear \
    --home /home/robin/clearlinux \
    --additional-flags "--shm-size=4g" \
    --additional-flags "--gpus all" \
    --additional-flags "--device=/dev/nvidiactl" \
    --additional-flags "--device=/dev/nvidia0"

Enter the environment:

distrobox enter robin_clear

Download CUDA-11.03 run file and install it in the robin_clear:

sudo swupd bundle-add libxml2

sudo ./cuda_11.3.0_465.19.01_linux.run \
        --toolkit \
        --no-man-page \
        --override \

Then, the important part: install gcc-10 (ClearLinux included gcc-12, which is too high for CUDA-11.03) and create the symbol links for it

sudo swupd bundle-add c-extras-gcc10
sudo ln -s /usr/bin/gcc-10 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-10 /usr/local/cuda/bin/g++

Install the PyTorch:

sudo swupd bundle-add python3-basic

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Install the apex for mixed-precision training (because my model training is using it):

git clone https://github.com/NVIDIA/apex
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Now I can run my training in the ClearLinux. The comparison of these two docker images is here:

CUDA VersionPyTorch VersionDocker image sizeVRAM UsageTime for training one batch
Nvidia Official PyTorch Image11. GB10620MB0.2745 seconds
My ClearLinux Image11. GB10936MB0.3066 seconds
My ClearLinux Image(v2) GB10964MB0.2812 seconds
My ClearLinux Image(build PyTorch myself) GB10658MB0.2716 seconds

Looks like Nvidia works better than me 🙂 The only chance to win it is by using the newest CUDA and building state-of-art PyTorch manually.