My company has been using Argo for executing workflow for more than three years. I knew every step in the Argo workflow could be controlled by when expression, like this:
When I login my computer and try to run “tmux attach” this morning, it reported a strange error:
/tmp/tmux-1001/default (Address already in use)
Intuitively, I thought this temporary file is out of date. So I just type in a command to delete it. But another error jumped out “The filesystem is read-only!”.
By looking at the mount point “mount|grep ro,”, I noticed that my root directory is mounted with “read-only” option. Checking the /etc/fstab:
I guess one of the mount options is wrong so the operating system only mounts a “read-only” filesystem.
After I remove the options one by one and reboot the machine many times, it turns out to be that “data=writeback” is the incorrect option. Essentially, “data=writeback” option is only for ext3.
When I trying to modify /etc/fstab, the system report “you can’t change file because the root filesystem is read-only”. Seems I was trapped in a dead loop… so I use my final weapon:
sudo mount -o remount,rw /dev/nvme0n1p2 /
And it works.
Now, by setting /etc/fstab, the ext4 filesystem could be mounted with both read and write permission:
To save the cost on the inference server, I did some experiments on how to accelerate the speed of prediction for our model.
import torch.nn as nn
import pycls.core.builders as model_builder
from pycls.core.config import cfg
def pressure_predict(net, tensor_img):
t0 = time.time()
for _ in range(10):
result = net(tensor_img)
result = softmax(result)
values, indices = torch.topk(result, 10)
t1 = time.time()
print("time:", t1 - t0)
print(values)
if __name__ == "__main__":
cfg.MODEL.TYPE = "regnet"
# RegNetY-8.0GF
cfg.REGNET.DEPTH = 17
cfg.REGNET.SE_ON = False
cfg.REGNET.W0 = 192
cfg.REGNET.WA = 76.82
cfg.REGNET.WM = 2.19
cfg.REGNET.GROUP_W = 56
cfg.BN.NUM_GROUPS = 4
cfg.MODEL.NUM_CLASSES = 11120
net = model_builder.build_model()
net.load_state_dict(torch.load("bird_cls_2754696.pth", map_location="cpu"))
net.eval()
net = net.float()
softmax = nn.Softmax(dim=1).eval()
# read image
img = cv2.imread("blujay.jpg")
img = cv2.resize(img, (300, 300))
tensor_img = torch.from_numpy(img).unsqueeze(0).permute(0, 3, 1, 2).float()
pressure_predict(net, tensor_img)
dummy_input = torch.randn(1, 3, 300, 300)
with torch.jit.optimized_execution(True):
traced_script_module = torch.jit.trace(net, dummy_input)
net = torch.jit.optimize_for_inference(traced_script_module)
pressure_predict(net, tensor_img)
import intel_extension_for_pytorch as ipex
net = net.to(memory_format=torch.channels_last)
net = ipex.optimize(net)
tensor_img = tensor_img.to(memory_format=torch.channels_last)
with torch.no_grad():
pressure_predict(net, tensor_img)
Here is the output on my Intel i5-12400 CPU:
inference time (seconds per 10 times)
Directly use model
1.6
After PyTorch’s torch.jit.optimize_for_inference()
1.4
After Intel’s ipex.optimize()
0.8
Looks like Intel tried hard to optimize their CPU for neural network models. But the only problem is that the intel_extension_for_pytorch the package is hard to install (a lot of broken dependencies when I am trying to install and run it), and the best way to use it is through the docker image intel/intel-optimized-pytorch:latest
After reading this paper, I begin to do an experiment about it. Referencing this snippet, I wrote my code:
net1 = model_builder.build_model()
net2 = model_builder.build_model()
output = model_builder.build_model()
net1.load_state_dict(torch.load(args.model1, map_location="cpu"))
net2.load_state_dict(torch.load(args.model2, map_location="cpu"))
# Average
sd1 = net1.named_parameters()
sd2 = net2.named_parameters()
sdo = dict(sd2)
for name, param in sd1:
sdo[name].data.copy_(0.5*param.data + 0.5*sdo[name].data)
output.load_state_dict(sdo)
torch.save(output, args.output)
# here is a test
output.load_state_dict(torch.load(args.output))
But after generating the average-weights new model, the PyTorch failed to load it:
Traceback (most recent call last):
File "average_models.py", line 43, in <module>
output.load_state_dict(torch.load(args.output))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1534, in load_state_dict
state_dict = state_dict.copy()
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'RegNet' object has no attribute 'copy'
The reason for failure is quite simple: we only need to save the state_dict of the model instead of all information (since I am using FP16 format ). Therefore the correct code should be:
I used awk for quite a long time, but not his brother sed. A couple of days ago I want to insert two lines for a CMake file in a specific position and find a perfect answer: here.
Now I could add two lines by using:
sed -i '/^enable_language/i set(CMAKE_CUDA_ARCHITECTURES 86)\nset(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)' \
cmake/public/cuda.cmake
I am using Nvidia’s official docker image of PyTorch for my model training for quite a long time. It works very well but the only problem is that the image is too large: more than 6GB. In my poor home network, it would cost a painfully long time to download.
Yesterday, an interesting idea jumped out to my mind: why not build my own small docker image to use PyTorch? Then I started to do it.
I need to download some large files from Google Drive on my server (“server” means no GUI). After a quick search, I got a solution: https://stackoverflow.com/a/50670037/5048046
We can just install it by using pip:
python3 -m pip install gdown
Then just give the URL of the Google Drive file to it:
I am doing some research on TTS (Text-To-Speech) recently and noticed three almost state-of-the-art and also out-of-the-box solutions: LightSpeech (from Microsoft), FastSpeech2 (partly from Microsoft), Nemo (from Nvidia).
it has a lot of noise and sounds like some type of metal.
The output of LightSpeech:
The output of Nemo:
This test is just a summary of my research works and doesn’t mean which algorithm is better than others since the training process will heavily affect the final result. But at least, Nemo is the nearest one to the product scenario.