Looks they are “2022-03-22” on Tuesday. But after I export this into BigQuery and select them, they became “2022-03-21 UTC”, which is Monday by default.
The problem is definitely about the Timezone this column has:
I am trying to learn Django (a python framework for developing web) in docker container recently. After running the docker with port redirect
sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
cd mysite
python manage.py runserver
The output of Django server is
System check identified no issues (0 silenced).
September 24, 2014 - 08:52:05
Django version 1.7, using settings 'oddjobs.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Then I use command sudo docker ps to find out the port number for host machine:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
08d01206d676 centos/oddjobs:run "/bin/bash" About an hour ago Up About an hour 0.0.0.0:49198->8000/tcp kickass_pare
but when using curl 127.0.0.1:49198 in host machine it just report “Connection Refused”
After searching in google, I only find one article which seems useful for me. But my problem is still there after I follow its steps for solution. With no choice, I have to read the documents of docker carefully and do my experiment step by step.
First, I run a nc-server in docker:
sudo docker run -p 8000 -t -i centos/django /bin/bash
#in the docker now
nc -l 8000
#listening on port 8000
Then using nc 127.0.0.1 8000 in host. It failed too. Why can’t nc client connect to server in docker even I followed the document of docker?After running netstat in docker, I find out the answer: my centos image is centos7 and the ‘nc’ in it will listen on ipv6 address by default. If anyone want to listen on ipv4 address, it should type in
nc -l 8000 -4
Now, the nc client could connect to server now.
But how to run Django server in ipv4 address? This article tells me the way. Now, it seems everything is ok. I start Django again with python manage.py runserver 127.0.0.1:8000 but it still could not be connected with nc client in host. Oh, the ip “127.0.0.1” and “0.0.0.0” is very different so I should run Django like:
python manage.py 0.0.0.0:8000
The browser in host could access the Django example site now.
admission webhook "validation.gatekeeper.sh" deni
ed the request: [denied by autogke-no-write-mode-hostpath] hostPath volume docker-sock used in container wait
uses path /var/run/docker.sock which is not allowed in Autopilot. Allowed path prefixes for hostPath volumes a
re: ["/var/log/"]. Requesting user: <system:serviceaccount:argo:argo> and groups: <["system:serviceaccounts",
"system:serviceaccounts:argo", "system:authenticated"]>
col1
0 NaN
1 NaN
2 NaN
0 False
1 False
2 False
Name: col1, dtype: bool
0 False
1 False
2 False
Name: col1, dtype: bool
If we directly convert “None” to the “float” type, it will become the “NaN” value. The “Nan” couldn’t be compared with real float number therefore it is neither “bigger or equal than” a float-point number nor “smaller than” it.
Since only float-point type could allow the “None” value in a column, we should be much careful when processing with float-point number.
About two weeks ago I met the “number of islands” problem in LeetCode. The first idea that jumped out of my brain is ‘dynamic programming’: I can create a new matrix (let’s name it ‘number matrix’) with the same size as the problem matrix, and set every position a value. The value is the number of islands from (0, 0) to this position.
For example, as the below problem matrix:
1
1
1
1
0
0
0
0
1
0
1
1
0
0
0
0
problem matrix
We can create the ‘number matrix’:
1
1
1
1
1
1
1
1
2
2
3
3
2
2
3
3
number matrix
Why the (4, 0) position in ‘number matrix’ is 2? Because the (0, 0) to (4, 0) area in ‘problem matrix’ has two islands:
1
0
1
0
(0, 0) to (4, 0) area of ‘problem matrix’
Then, every time we want to calculate the value of a position in the ‘number matrix’ we can first calculate the values of its ‘left’, ‘top’ and ‘left-top’ position.
For example, since the (3, 3), (4, 3) and (3, 4) positions in the ‘number matrix’ are all ‘3’, and the (4, 4) position in ‘problem matrix’ is ‘0’, the (4, 4) position of ‘number matrix’ should be ‘3’ also.
But unfortunately, after two weeks of struggling on thinking, I finally found out that the value of a position in the ‘number matrix’ can’t be decided by its nearest three positions: left, top and left-top. The counterexample is below:
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
1
1
1
1
0
1
1
0
0
0
1
0
1
1
0
1
0
1
0
1
1
0
1
1
1
0
1
1
1
1
1
1
1
1
problem matrix A
Let’s just build the number matrix for the (6, 6) to (7, 7) area of problem matrix A:
2
2
2
Don’t rush. Let’s look at another problem matrix:
1
1
1
1
1
1
0
0
0
1
1
0
1
0
1
1
0
0
0
1
1
1
1
1
1
problem matrix B
This time, we also build the number matrix for the (4, 4) to (5, 5) area of problem matrix B:
2
2
2
See? They have the same values for left, top and left-top, but different final results (problem matrix A has just 1 island but problem matrix B has 2).
Two weeks of hard thinking just got the wrong idea. But this is the charming of algorithm 🙂
ls whatever
if [ $? -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Since there is no file named ‘whatever’ in the current directory, the return code ‘$?’ will be none-zero and the program will finally print:
$ sh test.sh
ls: whatever: No such file or directory
fail
What will happen when someone wants to get the detail of the failed command in the shell script? Like, write error log into a file and also to the standard output, by using command ‘tee’?
ls whatever|tee log
if [ $? -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Unfortunately, the ‘$?’ will get the return status of ‘tee log’ instead of ‘ls whatever’. Then the snippet will print what we (at least myself) don’t expect:
We can modify our shell script by following the new method:
ls whatever|tee log
if [ ${PIPESTATUS[0]} -eq 0 ]; then
echo "success"
else
echo "fail"
fi
Shame to say, even as a long term UNIX developer, this is the first time I know that the pipe symbol ‘|’ will also change the return status of the whole line of command.
We usually use the below Python code to get CPU cores:
from multiprocessing import cpu_count
print("CPU cores:", cpu_count())
But when the snippet running inside a docker container, it will just return the number of CPU cores for the physical machine the container runs on, not the actually --cpus (for docker) or CPU limit (for Kubernetes).
Then, how could we get the CPU cores set by docker argument or Kubernetes configuration for this container?
The only answer I could find is here. And the corresponding Python code written by me is:
def get_cpu_limit():
with open("/sys/fs/cgroup/cpu/cpu.cfs_quota_us") as fp:
cfs_quota_us = int(fp.read())
with open("/sys/fs/cgroup/cpu/cpu.cfs_period_us") as fp:
cfs_period_us = int(fp.read())
container_cpus = cfs_quota_us // cfs_period_us
# For physical machine, the `cfs_quota_us` could be '-1'
cpus = cpu_count() if container_cpus < 1 else container_cpus
return cpus
We were using Pandas to get the number of rows for a parquet file:
import pandas as pd
df = pd.read_parquet("my.parquet")
print(df.shape[0])
This is easy but will cost a lot of time and memory when the parquet file is very large. For example, it may cost more than 100GB of memory to just read a 10GB parquet file.
If we only need to get the number of rows, not the whole data, Pyarrow will be a better solution:
import pyarrow.parquet as pq
table = pq.read_table("my.parquet", columns=[])
print(table.num_rows)
This method only spend a couple seconds and cost about 2GB of memory for the same parquet file.
Intending to use distilling for training my model. The Plan is:
Train model A and model B with same code and same dataset
Predict the dataset with model A and model B, and store the average of their result
Use the average prediction as the target of a new training process
Step 1 and Step 2 are successful. But when I run the new training process, it will report the loss as “Nan” after some steps.
To find out the reason, I started to print all the “average prediction results” for every step. At first, they look just as normal, but after a while, I find out that some input has “Nan”.
Why there is “Nan” in the “average prediction results”? I guess the reason is: some samples are too rare (or special) so the model will give an unreliable output. It’s quite easy to just ignore them:
if np.isnan(label).any() or not np.isfinite(label).all():
# Drop the corresponding sample
return None