Apex is a tool to enable mixed-precision training that comes from Nvidia.
import apex.amp as amp
net, optimizer = amp.initialize(net, optimizer, opt_level="O2")
# forward
outputs = net(inputs)
loss = criterion(outputs, targets)
optimizer.zero_grad()
# float16 backward
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
...
torch.save(net, "model.pth")
After I changed my code to use Apex, it reported an error when saving the model by using torch.save(net, "model.pth")
AttributeError: Can't pickle local object '_initialize.<locals>.patch_forward.<locals>.new_fwd'
Someone has already noticed this problem but it seems no one wants to solve it: link. The only solution for this comes from a Chinese blog: link. It recommends just saving model parameters:
When I am trying to log in to my account in a Google Engine VM with “gcloud auth login“, it jumps out a hint:
You are authorizing gcloud CLI without access to a web browser. Please run the following command on a machine with a web browser and copy its output back here. Make sure the installed gcloud version is 372.0.0 or newer.
gcloud auth login --remote-bootstrap="https://acounts.google.com/xxxxxxxxxx"
Enter the output of the above command:
So I copied this command and ran it on my laptop. But after finishing the process and pressing the “Allow” button, it just jumped to https://cloud.google.com/sdk/auth_success. There is no “output” for me to paste back into the original “cloud auth login“.
Next time if you meet this same situation, please don’t panic. The answer is here: https://stackoverflow.com/a/49885635/5048046
We just need to add the argument “–no-launch-browser”:
I know it looks too late that I wrote this article in the middle of the year. But, late is much better than none. Right?
I bought the seven books series of “A song of ice and fire” on 23rd November 2019 and finish reading them on 3rd August 2021. It is definitely the best fantasy novel I have read so far (sorry Salvatore, I just told you my true thought). The whole world and the whole story in it are cold and cruel, but unimaginably fascinating and attractive.
The first book of the series is published early in 1996 when I was just a stupid middle school student. I feel really regret I hadn’t started to learn English more intensive and started to read “A song of ice and fire” since then.
The currently last book of the series is published in 2011, a total of eleven years ago. Oh, George, please write the last two volumes as quickly as possible. Your fans had been waiting for 11 years. I don’t want to wait another 11 years more…
Fortunately, I also got some time to look into some area that I am really interesting in but couldn’t benifit my work, such as Semiconductor. The book “Introduction to Semiconductor Technology” really helps me a lot. It starts the introduction from basic phisicals and chemistry knowledge to how to make silicon wafer, to how to paste photoresist, to how to etch, to how lithography, to how to etch, to how to make backend wires. Although forgot almost all my chemistry knowledge before, I still could understand all the process in this industry. And it looks really cool! I can’t hold to show a image for a beautiful DRAM:
In computer science, the algorithm is one of the most important, and also the most difficult part. As a career routine, I found this “The Algorithm Design Manual” and study the algorithms again, mainly focus on the dynamic-programming and just skim through the graph algorithm (they are far away from my daily work, and also far difficult…). I couldn’t say I understood this book very well but at least I take my share of it. And, hope I can revisit it again in the near future.
Because of my wife’s strong recommendation, I started to use Oracle Cloud to host my blog, for free 🙂
Currently, the Oracle Cloud provides four 4-core/24GB ARM64 virtual machines without any fees. This is the best part.
After migrating my blog from AWS (t2.micro) to this ARM64 VM, everything works well except that the permalink of my posts are all changed from date/postname to postnumber. (By following this link and changing my Nginx configuration file /etc/nginx/sites-available/default, this problem has been solved just seconds ago!)
Then I started to think about whether I could deploy other services on this ARM64 VM. Here is my test code for using yolov5:
import cv2
import time
import argparse
import torch
import torch.nn as nn
IMAGE_SHAPE = (300, 300)
def predict(args):
img = cv2.imread(args.image_file)
# detect
model = torch.hub.load('ultralytics/yolov5', 'yolov5m')
net = torch.load(args.classify_model, map_location=torch.device('cpu'))
net.eval()
softmax = nn.Softmax(dim=1)
t0 = time.time()
for _ in range(100):
results = model(img)
for img in results.render():
# Just find the most possible bird (if there are many)
img = cv2.resize(img, IMAGE_SHAPE)
tensor_img = torch.from_numpy(img)
result = net(tensor_img.unsqueeze(0).permute(0, 3, 1, 2).float())
result = softmax(result)
values, indices = torch.topk(result, 10)
t1 = time.time()
print(indices)
print('time:', t1 - t0)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--image_file', default="2400.jpg", type=str, help='Image file to be predicted')
parser.add_argument('--classify_model', default='bird_cls_1160000.pth',
type=str, help='Trained ckpt file path to open')
args = parser.parse_args()
predict(args)
It only cost 0.67 seconds for a yolov5m model. This means the Ampere Arm64 core is as fast as AMD EPYC!
Looks they are “2022-03-22” on Tuesday. But after I export this into BigQuery and select them, they became “2022-03-21 UTC”, which is Monday by default.
The problem is definitely about the Timezone this column has:
I am trying to learn Django (a python framework for developing web) in docker container recently. After running the docker with port redirect
sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
cd mysite
python manage.py runserver
The output of Django server is
System check identified no issues (0 silenced).
September 24, 2014 - 08:52:05
Django version 1.7, using settings 'oddjobs.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Then I use command sudo docker ps to find out the port number for host machine:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
08d01206d676 centos/oddjobs:run "/bin/bash" About an hour ago Up About an hour 0.0.0.0:49198->8000/tcp kickass_pare
but when using curl 127.0.0.1:49198 in host machine it just report “Connection Refused”
After searching in google, I only find one article which seems useful for me. But my problem is still there after I follow its steps for solution. With no choice, I have to read the documents of docker carefully and do my experiment step by step.
First, I run a nc-server in docker:
sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
nc -l 8000
#listening on port 8000
Then using nc 127.0.0.1 8000 in host. It failed too. Why can’t nc client connect to server in docker even I followed the document of docker?After running netstat in docker, I find out the answer: my centos image is centos7 and the ‘nc’ in it will listen on ipv6 address by default. If anyone want to listen on ipv4 address, it should type in
nc -l 8000 -4
Now, the nc client could connect to server now.
But how to run Django server in ipv4 address? This article tells me the way. Now, it seems everything is ok. I start Django again with python manage.py runserver 127.0.0.1:8000 but it still could not be connected with nc client in host. Oh, the ip “127.0.0.1” and “0.0.0.0” is very different so I should run Django like:
python manage.py 0.0.0.0:8000
The browser in host could access the Django example site now.
admission webhook "validation.gatekeeper.sh" deni
ed the request: [denied by autogke-no-write-mode-hostpath] hostPath volume docker-sock used in container wait
uses path /var/run/docker.sock which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes a
re: ["/var/log/"]. Requesting user: <system:serviceaccount:argo:argo> and groups: <["system:serviceaccounts",
"system:serviceaccounts:argo", "system:authenticated"]>
col1
0 NaN
1 NaN
2 NaN
0 False
1 False
2 False
Name: col1, dtype: bool
0 False
1 False
2 False
Name: col1, dtype: bool
If we directly convert “None” to the “float” type, it will become the “NaN” value. The “Nan” couldn’t be compared with real float number therefore it is neither “bigger or equal than” a float-point number nor “smaller than” it.
Since only float-point type could allow the “None” value in a column, we should be much careful when processing with float-point number.
About two weeks ago I met the “number of islands” problem in LeetCode. The first idea that jumped out of my brain is ‘dynamic programming’: I can create a new matrix (let’s name it ‘number matrix’) with the same size as the problem matrix, and set every position a value. The value is the number of islands from (0, 0) to this position.
For example, as the below problem matrix:
1
1
1
1
0
0
0
0
1
0
1
1
0
0
0
0
problem matrix
We can create the ‘number matrix’:
1
1
1
1
1
1
1
1
2
2
3
3
2
2
3
3
number matrix
Why the (4, 0) position in ‘number matrix’ is 2? Because the (0, 0) to (4, 0) area in ‘problem matrix’ has two islands:
1
0
1
0
(0, 0) to (4, 0) area of ‘problem matrix’
Then, every time we want to calculate the value of a position in the ‘number matrix’ we can first calculate the values of its ‘left’, ‘top’ and ‘left-top’ position.
For example, since the (3, 3), (4, 3) and (3, 4) positions in the ‘number matrix’ are all ‘3’, and the (4, 4) position in ‘problem matrix’ is ‘0’, the (4, 4) position of ‘number matrix’ should be ‘3’ also.
But unfortunately, after two weeks of struggling on thinking, I finally found out that the value of a position in the ‘number matrix’ can’t be decided by its nearest three positions: left, top and left-top. The counterexample is below:
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
1
1
1
0
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
problem matrix A
Let’s just build the number matrix for the (6, 6) to (7, 7) area of problem matrix A:
2
2
2
Don’t rush. Let’s look at another problem matrix:
1
1
1
1
1
1
0
0
0
1
1
0
1
0
1
1
0
0
0
1
1
1
1
1
1
problem matrix B
This time, we also build the number matrix for the (4, 4) to (5, 5) area of problem matrix B:
2
2
2
See? They have the same values for left, top and left-top, but different final results (problem matrix A has just 1 island but problem matrix B has 2).
Two weeks of hard thinking just got the wrong idea. But this is the charming of algorithm 🙂
ls whatever
if [ $? -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Since there is no file named ‘whatever’ in the current directory, the return code ‘$?’ will be none-zero and the program will finally print:
$ sh test.sh
ls: whatever: No such file or directory
fail
What will happen when someone wants to get the detail of the failed command in the shell script? Like, write error log into a file and also to the standard output, by using command ‘tee’?
ls whatever|tee log
if [ $? -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Unfortunately, the ‘$?’ will get the return status of ‘tee log’ instead of ‘ls whatever’. Then the snippet will print what we (at least myself) don’t expect:
We can modify our shell script by following the new method:
ls whatever|tee log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Shame to say, even as a long term UNIX developer, this is the first time I know that the pipe symbol ‘|’ will also change the return status of the whole line of command.