Author Archives: Robin Dong

Problems about using DistCp on Hadoop

After installing all Hadoop environment, I used DistCp to copy large files in distributed cluster. But it report error:

Seems it can’t even find the basic MapReduce class. Then I checked CLASSPATH for Hadoop:

Pretty strange, the HADOOP_CLASSPATH contains ‘mapreduce’ directories. It supposed to be able to find… Read more »

Using XGBoost to predict large sparse data

For using XGBoost to predict, I wrote code like this:

But it reported error:

Looks csr_matrix in SciPy is not supported by XGBoost. Maybe I need to transfer sparse data to dense:

But it still reported:

The ‘test’ data is too big so it cann’t even… Read more »

Some summaries for Kaggle’s competition ‘Humpback Whale Identification’

This time, I only spent one month on competition “Humpback Whale Identification”. But still, get a little step forward than previous competitions. Here are my summaries: 1. Do review ‘kernels’ in competition page, this will teach me a lot of information and new technology. By using Siamese Network rather than… Read more »

Using ResNeXt in Keras 2.2.4

      1 Comment on Using ResNeXt in Keras 2.2.4

To use ResNeXt50, I wrote my code as the API documentation for Keras:

But it reported errors:

That’s weird. The code doesn’t work as documentation said. So I checked the code of Keras-2.2.4 (the version in my computer), and noticed that this version of code use ‘keras_applications’ instead… Read more »

Some tips about using Keras

      No Comments on Some tips about using Keras

1. How to use part of a model

The ‘img_embed’ model is part of ‘branch_model’. We should realise that ‘Model()’ is a heavy cpu-cost function so it need to be create only once and then could be used many times. 2. How to save a model when using ‘multi_gpu_model’… Read more »

Some tips about Python, Pandas, and Tensorflow

There are some useful tips for using Keras and Tensorflow to build models. 1. Using applications.inception_v3.InceptionV3(include_top = False, weights = ‘Imagenet’) to get pretrained parameters for InceptionV3 model, the console reported:

The solution is here. Just install some packages:

2. Could we use ‘add’ to merge two DataFrames… Read more »

LinearSVC versus SVC in scikit-learn

In competition ‘Quora Insincere Questions Classification’, I want to use simple TF-IDF statistics as a baseline.

The result is not bad:

But after I change LinearSVC to SVC(kernel=’linear’), the program couldn’t work out any result even after 12 hours! Am I doing anything wrong? In the page of… Read more »

Using keras.layers.Embedding instead of python dictionary

Firstly, I use a function to transform words into word-embedding:

But I noticed that it costs quite a few CPU resource while GPU usage is still low. The reason is simple: using single thread python to do search in dictionary is uneffective. We should use Embedding layer in Keras… Read more »

A few other lessons from Kaggle’s competition ‘Human Protein Atlas Image Classification’

Practice makes progress. Therefore I continued to join Kaggle’s new competition ‘Human Protein Atlas Image Classification’ after the previous one. I used think I could get a higher rating in image processing competition. But actually, I haven’t even entered the top half of rankings. After almost three month trials and… Read more »