RobinDong

Books I read in the year 2021

I know it looks too late that I wrote this article in the middle of the year. But, late is much better than none. Right?

I bought the seven books series of “A song of ice and fire” on 23rd November 2019 and finish reading them on 3rd August 2021. It is definitely the best fantasy novel I have read so far (sorry Salvatore, I just told you my true thought). The whole world and the whole story in it are cold and cruel, but unimaginably fascinating and attractive.

The first book of the series is published early in 1996 when I was just a stupid middle school student. I feel really regret I hadn’t started to learn English more intensive and started to read “A song of ice and fire” since then.

The currently last book of the series is published in 2011, a total of eleven years ago. Oh, George, please write the last two volumes as quickly as possible. Your fans had been waiting for 11 years. I don’t want to wait another 11 years more…

Fortunately, I also got some time to look into some area that I am really interesting in but couldn’t benifit my work, such as Semiconductor. The book “Introduction to Semiconductor Technology” really helps me a lot. It starts the introduction from basic phisicals and chemistry knowledge to how to make silicon wafer, to how to paste photoresist, to how to etch, to how lithography, to how to etch, to how to make backend wires. Although forgot almost all my chemistry knowledge before, I still could understand all the process in this industry. And it looks really cool! I can’t hold to show a image for a beautiful DRAM:

https://www.eetimes.com/dram-the-field-for-material-and-process-innovation/

In computer science, the algorithm is one of the most important, and also the most difficult part. As a career routine, I found this “The Algorithm Design Manual” and study the algorithms again, mainly focus on the dynamic-programming and just skim through the graph algorithm (they are far away from my daily work, and also far difficult…). I couldn’t say I understood this book very well but at least I take my share of it. And, hope I can revisit it again in the near future.

Using ARM64 on Oracle Cloud

Because of my wife’s strong recommendation, I started to use Oracle Cloud to host my blog, for free 🙂

Currently, the Oracle Cloud provides four 4-core/24GB ARM64 virtual machines without any fees. This is the best part.

After migrating my blog from AWS (t2.micro) to this ARM64 VM, everything works well except that the permalink of my posts are all changed from date/postname to postnumber. (By following this link and changing my Nginx configuration file /etc/nginx/sites-available/default, this problem has been solved just seconds ago!)

Then I started to think about whether I could deploy other services on this ARM64 VM. Here is my test code for using yolov5:

import cv2
import time
import argparse

import torch
import torch.nn as nn

IMAGE_SHAPE = (300, 300)

def predict(args):
    img = cv2.imread(args.image_file)
    # detect
    model = torch.hub.load('ultralytics/yolov5', 'yolov5m')

    net = torch.load(args.classify_model, map_location=torch.device('cpu'))
    net.eval()

    softmax = nn.Softmax(dim=1)

    t0 = time.time()

    for _ in range(100):
        results = model(img)
        for img in results.render():
            # Just find the most possible bird (if there are many)
            img = cv2.resize(img, IMAGE_SHAPE)

            tensor_img = torch.from_numpy(img)
            result = net(tensor_img.unsqueeze(0).permute(0, 3, 1, 2).float())
            result = softmax(result)
            values, indices = torch.topk(result, 10)

    t1 = time.time()

    print(indices)
    print('time:', t1 - t0)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--image_file', default="2400.jpg", type=str, help='Image file to be predicted')
    parser.add_argument('--classify_model', default='bird_cls_1160000.pth',
                        type=str, help='Trained ckpt file path to open')
    args = parser.parse_args()

    predict(args)

It only cost 0.67 seconds for a yolov5m model. This means the Ampere Arm64 core is as fast as AMD EPYC!

pandas.datetime64 with Timezone

I barely pay attention to the pandas.datetime64 type. But yesterday a problem stroke me.

It was a parquet file with a column “start_date”:

>>> df["start_date"]
                  start_date
0  2022-03-22 00:00:00+11:00
1  2022-03-22 00:00:00+11:00
2  2022-03-22 00:00:00+11:00
3  2022-03-22 00:00:00+11:00
4  2022-03-22 00:00:00+11:00

Looks they are “2022-03-22” on Tuesday. But after I export this into BigQuery and select them, they became “2022-03-21 UTC”, which is Monday by default.

The problem is definitely about the Timezone this column has:

>>> df.dtypes
start_date     datetime64[ns, Australia/Sydney]

What we need to do to be aligned with BigQuery is just remove the timezone and make the time to just “2022-03-22”.

The solution is forcibly and simple:

df["start_date"] = df["start_date"].dt.tz_localize(None)

Running Django in docker

I am trying to learn Django (a python framework for developing web) in docker container recently. After running the docker with port redirect

sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
cd mysite
python manage.py runserver

The output of Django server is

System check identified no issues (0 silenced).
September 24, 2014 - 08:52:05
Django version 1.7, using settings 'oddjobs.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Then I use command sudo docker ps to find out the port number for host machine:

CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS                     NAMES
08d01206d676        centos/oddjobs:run   "/bin/bash"         About an hour ago   Up About an hour    0.0.0.0:49198->8000/tcp   kickass_pare

but when using curl 127.0.0.1:49198 in host machine it just report “Connection Refused”
After searching in google, I only find one article which seems useful for me. But my problem is still there after I follow its steps for solution. With no choice, I have to read the documents of docker carefully and do my experiment step by step.
First, I run a nc-server in docker:

sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
nc -l 8000
#listening on port 8000

Then using nc 127.0.0.1 8000 in host. It failed too. Why can’t nc client connect to server in docker even I followed the document of docker?After running netstat in docker, I find out the answer: my centos image is centos7 and the ‘nc’ in it will listen on ipv6 address by default. If anyone want to listen on ipv4 address, it should type in

nc -l 8000 -4

Now, the nc client could connect to server now.
But how to run Django server in ipv4 address? This article tells me the way. Now, it seems everything is ok. I start Django again with python manage.py runserver 127.0.0.1:8000 but it still could not be connected with nc client in host. Oh, the ip “127.0.0.1” and “0.0.0.0” is very different so I should run Django like:

python manage.py 0.0.0.0:8000

The browser in host could access the Django example site now.

Try to deploy Argo on GCP Autopilot

Creating an Autopilot cluster in GCP K8S is quite easy. But after deploying Argo and launching our pipeline, the Argo report errors:

Failed to pull image "eu-docker.pkg.dev/project-name/mytag:123456789"
b85d23bf513ba037f4b2fbd5e": rpc error: code = Unknown desc = failed to pull and unpack image eu-docker.pkg.dev/project-name/mytag:123456789": failed to resolve reference "eu-docker.pkg.dev/project-name/mytag:123456789": failed to authorize: failed to fetch oauth token: unexpected status: 403 Forbidden

The solution is (give k8s cluster the permission to pull docker image from our docker repository):

kubectl create secret docker-registry gcr-json-key  --docker-server=eu-docker.pkg.dev  --docker-username=_json_key  --docker-password="$(cat our_service_account.json)"  --docker-email=your@email.address -n argo
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "gcr-json-key"}]}' -n argo

Then the second problem jumped out:

admission webhook "validation.gatekeeper.sh" deni
ed the request: [denied by autogke-no-write-mode-hostpath] hostPath volume docker-sock used in container wait
uses path /var/run/docker.sock which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes a
re: ["/var/log/"]. Requesting user: <system:serviceaccount:argo:argo> and groups: <["system:serviceaccounts",
"system:serviceaccounts:argo", "system:authenticated"]>

The solution is to set emissary as containerRuntimeExecutor by modifying the file of Argo’s install.yaml:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  config: |
    containerRuntimeExecutor: emissary
    containerRuntimeExecutors:
      - name: emissary
        selector:
          matchLabels:
            workflows.argoproj.io/container-runtime-executor: emissary
      - name: pns
        selector:
          matchLabels:
            workflows.argoproj.io/container-runtime-executor: pns
      - name: k8sapi
        selector:
          matchLabels:
            workflows.argoproj.io/container-runtime-executor: k8sapi

Finally, seems all problems have been solved. My colleague Tianchu find out that Autopilot couldn’t support a pod with memory larger than 80GB:

Since many of our applications need memory more than 80 GB, Autopilot can’t be our choice in recent limitations.

NaN value in DataFrame

NaN value in NumPy or Pandas is some type dangerous for data processing. Like the example below:

import numpy as np
import pandas as pd
df = pd.DataFrame({"col1": [None, None, None]})
df["col1"] = df["col1"].astype("float")
print(df)
print(df["col1"] >= 3.14)
print(df["col1"] < 3.14)

   col1
0   NaN
1   NaN
2   NaN
0    False
1    False
2    False
Name: col1, dtype: bool
0    False
1    False
2    False
Name: col1, dtype: bool

If we directly convert “None” to the “float” type, it will become the “NaN” value. The “Nan” couldn’t be compared with real float number therefore it is neither “bigger or equal than” a float-point number nor “smaller than” it.

Since only float-point type could allow the “None” value in a column, we should be much careful when processing with float-point number.

A wrong way to solve

About two weeks ago I met the “number of islands” problem in LeetCode. The first idea that jumped out of my brain is ‘dynamic programming’: I can create a new matrix (let’s name it ‘number matrix’) with the same size as the problem matrix, and set every position a value. The value is the number of islands from (0, 0) to this position.

For example, as the below problem matrix:

1	1	1	1
0	0	0	0
1	0	1	1
0	0	0	0

problem matrix

We can create the ‘number matrix’:

1	1	1	1
1	1	1	1
2	2	3	3
2	2	3	3

number matrix

Why the (4, 0) position in ‘number matrix’ is 2? Because the (0, 0) to (4, 0) area in ‘problem matrix’ has two islands:

(0, 0) to (4, 0) area of ‘problem matrix’

Then, every time we want to calculate the value of a position in the ‘number matrix’ we can first calculate the values of its ‘left’, ‘top’ and ‘left-top’ position.

For example, since the (3, 3), (4, 3) and (3, 4) positions in the ‘number matrix’ are all ‘3’, and the (4, 4) position in ‘problem matrix’ is ‘0’, the (4, 4) position of ‘number matrix’ should be ‘3’ also.

But unfortunately, after two weeks of struggling on thinking, I finally found out that the value of a position in the ‘number matrix’ can’t be decided by its nearest three positions: left, top and left-top. The counterexample is below:

1	1	1	1	1	1	1
0	0	0	0	0	0	1
1	1	1	1	1	0	1
1	0	0	0	1	0	1
1	0	1	0	1	0	1
1	0	1	1	1	0	1
1	1	1	1	1	1	1

problem matrix A

Let’s just build the number matrix for the (6, 6) to (7, 7) area of problem matrix A:

2	2
2

Don’t rush. Let’s look at another problem matrix:

1	1	1	1	1
1	0	0	0	1
1	0	1	0	1
1	0	0	0	1
1	1	1	1	1

problem matrix B

This time, we also build the number matrix for the (4, 4) to (5, 5) area of problem matrix B:

2	2
2

See? They have the same values for left, top and left-top, but different final results (problem matrix A has just 1 island but problem matrix B has 2).

Two weeks of hard thinking just got the wrong idea. But this is the charming of algorithm 🙂

Capture the return status of a shell command

Let’s start by looking at the code of the shell:

ls whatever
if [ $? -eq 0 ]; then
  echo "success"
else
  echo "fail"
fi

Since there is no file named ‘whatever’ in the current directory, the return code ‘$?’ will be none-zero and the program will finally print:

$ sh test.sh
ls: whatever: No such file or directory
fail

What will happen when someone wants to get the detail of the failed command in the shell script? Like, write error log into a file and also to the standard output, by using command ‘tee’?

ls whatever|tee log
if [ $? -eq 0 ]; then
  echo "success"
else
  echo "fail"
fi

Unfortunately, the ‘$?’ will get the return status of ‘tee log’ instead of ‘ls whatever’. Then the snippet will print what we (at least myself) don’t expect:

ls: whatever: No such file or directory
success

The reason is explained on this page.

We can modify our shell script by following the new method:

ls whatever|tee log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
  echo "success"
else
  echo "fail"
fi

Shame to say, even as a long term UNIX developer, this is the first time I know that the pipe symbol ‘|’ will also change the return status of the whole line of command.

How to get the number of CPU cores inside a container

We usually use the below Python code to get CPU cores:

from multiprocessing import cpu_count
print("CPU cores:", cpu_count())

But when the snippet running inside a docker container, it will just return the number of CPU cores for the physical machine the container runs on, not the actually --cpus (for docker) or CPU limit (for Kubernetes).

Then, how could we get the CPU cores set by docker argument or Kubernetes configuration for this container?

The only answer I could find is here. And the corresponding Python code written by me is:

def get_cpu_limit():
  with open("/sys/fs/cgroup/cpu/cpu.cfs_quota_us") as fp:
    cfs_quota_us = int(fp.read())
  with open("/sys/fs/cgroup/cpu/cpu.cfs_period_us") as fp:
    cfs_period_us = int(fp.read())
  container_cpus = cfs_quota_us // cfs_period_us
  # For physical machine, the `cfs_quota_us` could be '-1'
  cpus = cpu_count() if container_cpus < 1 else container_cpus
  return cpus

A handy Python library

To find an easy way to print the runtime of a function, I found an interesting Python library: funcy.

We can directly use decorator ‘@print_durations()’ like this:

from funcy import print_durations
@print_durations()
def my_func1():
  ...

There are also some other interesting decorators:

@memoize
def ip_to_city(ip):
  ...