Robin on Linux – Page 7 – All about technology

An interesting problem about ext4 mounting

When I login my computer and try to run “tmux attach” this morning, it reported a strange error:

/tmp/tmux-1001/default (Address already in use)

​x
 
​

Intuitively, I thought this temporary file is out of date. So I just type in a command to delete it. But another error jumped out “The filesystem is read-only!”.

By looking at the mount point “mount|grep ro,”, I noticed that my root directory is mounted with “read-only” option. Checking the /etc/fstab:

/dev/disk/by-uuid/69bf5a7f-4031-4a6d-b877-f83fc73a4440 / ext4 rw,discard,data=writeback, 0 1

I guess one of the mount options is wrong so the operating system only mounts a “read-only” filesystem.

After I remove the options one by one and reboot the machine many times, it turns out to be that “data=writeback” is the incorrect option. Essentially, “data=writeback” option is only for ext3.

When I trying to modify /etc/fstab, the system report “you can’t change file because the root filesystem is read-only”. Seems I was trapped in a dead loop… so I use my final weapon:

sudo mount -o remount,rw /dev/nvme0n1p2 /

And it works.

Now, by setting /etc/fstab, the ext4 filesystem could be mounted with both read and write permission:

/dev/disk/by-uuid/69bf5a7f-4031-4a6d-b877-f83fc73a4440 / ext4 rw,discard,noatime 0 1

Accelerate inference speed of DNN on Intel CPU

To save the cost on the inference server, I did some experiments on how to accelerate the speed of prediction for our model.

import torch.nn as nn

import pycls.core.builders as model_builder
from pycls.core.config import cfg

def pressure_predict(net, tensor_img):
    t0 = time.time()
    for _ in range(10):
        result = net(tensor_img)
        result = softmax(result)
        values, indices = torch.topk(result, 10)
    t1 = time.time()
    print("time:", t1 - t0)
    print(values)

if __name__ == "__main__":
    cfg.MODEL.TYPE = "regnet"
    # RegNetY-8.0GF
    cfg.REGNET.DEPTH = 17
    cfg.REGNET.SE_ON = False
    cfg.REGNET.W0 = 192
    cfg.REGNET.WA = 76.82
    cfg.REGNET.WM = 2.19
    cfg.REGNET.GROUP_W = 56
    cfg.BN.NUM_GROUPS = 4
    cfg.MODEL.NUM_CLASSES = 11120
    net = model_builder.build_model()
    net.load_state_dict(torch.load("bird_cls_2754696.pth", map_location="cpu"))
    net.eval()
    net = net.float()
    softmax = nn.Softmax(dim=1).eval()

    # read image
    img = cv2.imread("blujay.jpg")
    img = cv2.resize(img, (300, 300))
    tensor_img = torch.from_numpy(img).unsqueeze(0).permute(0, 3, 1, 2).float()
    pressure_predict(net, tensor_img)

    dummy_input = torch.randn(1, 3, 300, 300)
    with torch.jit.optimized_execution(True):
        traced_script_module = torch.jit.trace(net, dummy_input)

    net = torch.jit.optimize_for_inference(traced_script_module)
    pressure_predict(net, tensor_img)

    import intel_extension_for_pytorch as ipex
    net = net.to(memory_format=torch.channels_last)
    net = ipex.optimize(net)
    tensor_img = tensor_img.to(memory_format=torch.channels_last)

    with torch.no_grad():
        pressure_predict(net, tensor_img)

Here is the output on my Intel i5-12400 CPU:

	inference time (seconds per 10 times)
Directly use model	1.6
After PyTorch’s torch.jit.optimize_for_inference()	1.4
After Intel’s ipex.optimize()	0.8

Looks like Intel tried hard to optimize their CPU for neural network models. But the only problem is that the intel_extension_for_pytorch the package is hard to install (a lot of broken dependencies when I am trying to install and run it), and the best way to use it is through the docker image intel/intel-optimized-pytorch:latest

Average weights of two Pytorch models

After reading this paper, I begin to do an experiment about it. Referencing this snippet, I wrote my code:

    net1 = model_builder.build_model()
    net2 = model_builder.build_model()
    output = model_builder.build_model()
    net1.load_state_dict(torch.load(args.model1, map_location="cpu"))
    net2.load_state_dict(torch.load(args.model2, map_location="cpu"))
    
    # Average
    sd1 = net1.named_parameters()
    sd2 = net2.named_parameters()
    sdo = dict(sd2)
    for name, param in sd1:
        sdo[name].data.copy_(0.5*param.data + 0.5*sdo[name].data)

    output.load_state_dict(sdo)
    torch.save(output, args.output)
    
    # here is a test
    output.load_state_dict(torch.load(args.output))

But after generating the average-weights new model, the PyTorch failed to load it:

Traceback (most recent call last):
  File "average_models.py", line 43, in <module>
    output.load_state_dict(torch.load(args.output))
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1534, in load_state_dict
    state_dict = state_dict.copy()
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'RegNet' object has no attribute 'copy'

The reason for failure is quite simple: we only need to save the state_dict of the model instead of all information (since I am using FP16 format ). Therefore the correct code should be:

    net1 = model_builder.build_model()
    net2 = model_builder.build_model()
    net1.load_state_dict(torch.load(args.model1, map_location="cpu"))
    net2.load_state_dict(torch.load(args.model2, map_location="cpu"))

    # Average 
    sd1 = net1.named_parameters()
    sd2 = net2.named_parameters()
    sdo = dict(sd2) 
    for name, param in sd1:
        sdo[name].data.copy_(0.5*param.data + 0.5*sdo[name].data)

    torch.save(sdo, args.output)

BTW, the averaging of my models doesn’t rise accuracy as the paper suggests in my experiment.

Fix “unsupported-assignment-operation” error for Pandas

When using pylint to check my code, it reported an error:

E1137: 'df' does not support item assignment (unsupported-assignment-operation)

for the origin code:

df["column1"] = "hello"

It looks like we have no choice but to use operation .loc in pandas to avoid this error.

# Set all rows in "column1" to "hello"
df.loc[:, "column1"] = "hello"

Insert multiple lines in a specific position of a file

I used awk for quite a long time, but not his brother sed. A couple of days ago I want to insert two lines for a CMake file in a specific position and find a perfect answer: here.

Now I could add two lines by using:

sed -i '/^enable_language/i set(CMAKE_CUDA_ARCHITECTURES 86)\nset(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)' \
	cmake/public/cuda.cmake

The CMake file changed from:

if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
   set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_C_COMPILER}")
endif()
enable_language(CUDA)
set(CMAKE_CUDA_STANDARD ${CMAKE_CXX_STANDARD})
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
   set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_C_COMPILER}")
endif()
set(CMAKE_CUDA_ARCHITECTURES 86)
set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)
enable_language(CUDA)
set(CMAKE_CUDA_STANDARD ${CMAKE_CXX_STANDARD})
set(CMAKE_CUDA_STANDARD_REQUIRED ON)

Using PyTorch on ClearLinux docker image

I am using Nvidia’s official docker image of PyTorch for my model training for quite a long time. It works very well but the only problem is that the image is too large: more than 6GB. In my poor home network, it would cost a painfully long time to download.

Yesterday, an interesting idea jumped out to my mind: why not build my own small docker image to use PyTorch? Then I started to do it.

Firstly, I chose ClearLinux from Intel since it is very clean and used all state-of-art software (which means its performance is fabulous).

I used distrobox to create my environment:

distrobox create --image clearlinux:latest \
    --name robin_clear \
    --home /home/robin/clearlinux \
    --additional-flags "--shm-size=4g" \
    --additional-flags "--gpus all" \
    --additional-flags "--device=/dev/nvidiactl" \
    --additional-flags "--device=/dev/nvidia0"

Enter the environment:

distrobox enter robin_clear

Download CUDA-11.03 run file and install it in the robin_clear:

sudo swupd bundle-add libxml2

sudo ./cuda_11.3.0_465.19.01_linux.run \
        --toolkit \
        --no-man-page \
        --override \
        --silent

Then, the important part: install gcc-10 (ClearLinux included gcc-12, which is too high for CUDA-11.03) and create the symbol links for it

sudo swupd bundle-add c-extras-gcc10
sudo ln -s /usr/bin/gcc-10 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-10 /usr/local/cuda/bin/g++

Install the PyTorch:

sudo swupd bundle-add python3-basic

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Install the apex for mixed-precision training (because my model training is using it):

git clone https://github.com/NVIDIA/apex
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Now I can run my training in the ClearLinux. The comparison of these two docker images is here:

	CUDA Version	PyTorch Version	Docker image size	VRAM Usage	Time for training one batch
Nvidia Official PyTorch Image	11.7	1.12.0	14.7 GB	10620MB	0.2745 seconds
My ClearLinux Image	11.3	1.11.0	12.8 GB	10936MB	0.3066 seconds
My ClearLinux Image(v2)	11.3	1.12.0	12.8 GB	10964MB	0.2812 seconds
My ClearLinux Image(build PyTorch myself)	11.7	1.13.0	12.8 GB	10658MB	0.2716 seconds

Looks like Nvidia works better than me 🙂 The only chance to win it is by using the newest CUDA and building state-of-art PyTorch manually.

Download files from Google Drive in the console

I need to download some large files from Google Drive on my server (“server” means no GUI). After a quick search, I got a solution: https://stackoverflow.com/a/50670037/5048046

We can just install it by using pip:

python3 -m pip install gdown

Then just give the URL of the Google Drive file to it:

# https://drive.google.com/uc?id=<file_id>
gdown <file_id>

Some test samples for Text-To-Speech solutions

I am doing some research on TTS (Text-To-Speech) recently and noticed three almost state-of-the-art and also out-of-the-box solutions: LightSpeech (from Microsoft), FastSpeech2 (partly from Microsoft), Nemo (from Nvidia).

The testing text is a paragraph:

The Home Depot, Inc. is the world’s largest home improvement retailer based on net sales for fiscal 2021. We offer our customers a wide assortment of building materials, home improvement products, lawn and garden products, décor products, and facilities maintenance, repair and operations products and provide a number of services, including home improvement installation services and tool and equipment rental. As of the end of fiscal 2021, we operated 2,317 stores located throughout the U.S. (including the Commonwealth of Puerto Rico and the territories of the U.S. Virgin Islands and Guam), Canada, and Mexico. The Home Depot stores average approximately 104,000 square feet of enclosed space, with approximately 24,000 additional square feet of outside garden area. We also maintain a network of distribution and fulfillment centers, as well as a number of e-commerce websites in the U.S., Canada and Mexico. When we refer to “The Home Depot,” the “Company,” “we,” “us” or “our” in this report, we are referring to The Home Depot, Inc. and its consolidated subsidiaries.

The output of FastSpeech2:

it has a lot of noise and sounds like some type of metal.

The output of LightSpeech:

sounds a little better, more like human instead of robots

The output of Nemo:

this is the best result of all three solutions.

This test is just a summary of my research works and doesn’t mean which algorithm is better than others since the training process will heavily affect the final result. But at least, Nemo is the nearest one to the product scenario.

Model saving error when using Apex

Apex is a tool to enable mixed-precision training that comes from Nvidia.

import apex.amp as amp

net, optimizer = amp.initialize(net, optimizer, opt_level="O2")

# forward
outputs = net(inputs)

loss = criterion(outputs, targets)

optimizer.zero_grad()

# float16 backward
with amp.scale_loss(loss, optimizer) as scaled_loss:
  scaled_loss.backward()
  
optimizer.step()

...

torch.save(net, "model.pth")

After I changed my code to use Apex, it reported an error when saving the model by using torch.save(net, "model.pth")

AttributeError: Can't pickle local object '_initialize.<locals>.patch_forward.<locals>.new_fwd'

Someone has already noticed this problem but it seems no one wants to solve it: link. The only solution for this comes from a Chinese blog: link. It recommends just saving model parameters:

torch.save(net.state_dict(), "model.pth")

A problem with “gcloud auth login”

When I am trying to log in to my account in a Google Engine VM with “gcloud auth login“, it jumps out a hint:

You are authorizing gcloud CLI without access to a web browser. Please run the following command on a machine with a web browser and copy its output back here. Make sure the installed gcloud version is 372.0.0 or newer.

gcloud auth login --remote-bootstrap="https://acounts.google.com/xxxxxxxxxx"

Enter the output of the above command:

So I copied this command and ran it on my laptop. But after finishing the process and pressing the “Allow” button, it just jumped to https://cloud.google.com/sdk/auth_success. There is no “output” for me to paste back into the original “cloud auth login“.

Next time if you meet this same situation, please don’t panic. The answer is here: https://stackoverflow.com/a/49885635/5048046

We just need to add the argument “–no-launch-browser”:

gcloud auth login --no-launch-browser

and it will jump out to a web with “putout”