Author Archives: Robin Dong

To construct DataFrame more effectively

The old code of python looks like:

This snippet above will cost 7 seconds to run on my laptop. Actually, pd.concat() is an expensive operation for CPU. So let’s replace it with common python dictionary:

This snippet only costs 0.03 seconds, which is more effective.

Some problems when using GCP

After I launched a compute engine with container, it report error: Feb 03 00:12:28 xx-d19b201 konlet-startup[4664]: {“errorDetail”:{“message”:”failed to register layer: Error processing tar file(exit status 1): write /xxx/2020-01-16/base_cmd/part-00191-2e99af0e-1615-42af-9c60-910f9a9e6a17-c000.snappy.parquet: no space left on device”},”error”:”failed to register layer: Error processing tar file(exit status 1): write /xxx/2020-01-16/base_cmd/part-00191-2e99af0e-1615-42af-9c60-910f9a9e6a17-c000.snappy.parquet: no space left on device”}… Read more »

Problem about installing Kubeflow

Try to install Kubeflow by following this guide. But when I run

it reports

It did cost me some time to find the solution. So let’s try to make it short: Download file, and find some of its bottom lines:

Download the, untar it, and… Read more »

Directly deploy containers on GCP VM instance

We can directly deploy containers into VM instance of Google Compute Engine, instead of launching a heavy Kubernetes cluster. The command looks like:

To add enviroment variables to this container, we just need to add an argument:

To let the container run command for us, we need to… Read more »

How to ignore illegal sample of dataset in PyTorch?

I have implemented a dataset class for my image samples. But it can’t handle the situation that a corrupted image has been read:

The correct solution is in Pytorch Forum. Therefore I changed my code:

But it reports:

Seems default_collate() couldn’t recognize the ‘filter’ object. Don’t worry…. Read more »

Tips about pytest

      No Comments on Tips about pytest

1. Error for “fixture ‘mocker’ not found” After running pytest, it reported:

The solution is just installing the missing pip package:

2. How to make sure a function has been called without caring about its arguments? There are two methods. The first method is using “.called”

The… Read more »

The generating speed for random number in Python3

Just want to generate random number in a range (no matter float or integer) by using Python. Since I only need to get a random number in my code once a time, the speed for calling the generating-function is critical. So let’s do the experiment:

The result is:

Read more »

A problem about using DataFrame in Apache Spark

Here is the code for loading CSV file (table employee) to DataFrame of Apache Spark:

But after I run the jar in Spark, it report:

Seems data haven’t been correctly load. After reviewed the document for CSV format carefully, I noticed that the quote in my CSV file… Read more »