Just one picture:
I don’t know why the editing place of App name is under “Grow users”. But unfortunately, it’s there.
After you change the “App name” and click “Save” (You also need to upload a bunch of images before click it. Damn it)
All about technology
Just one picture:
I don’t know why the editing place of App name is under “Grow users”. But unfortunately, it’s there.
After you change the “App name” and click “Save” (You also need to upload a bunch of images before click it. Damn it)
I found a very interesting picture:
The size of this image is about 8MB although it’s blurring. Then I use below python code to try to resize it using different interpolation strategy:
import cv2 img = cv2.imread("/Users/robin/Downloads/ou1.png") for inter, name in [(cv2.INTER_AREA, "area"), (cv2.INTER_CUBIC, "cubic"), (cv2.INTER_LINEAR, "linear")]: after = cv2.resize(img, (310, 310), inter) cv2.imwrite(f"{name}.bmp", after)
“area” | “cubic” | “linear” |
Only cv2.INTER_AREA works well.
Then let me try dart language:
import 'dart:io'; import 'package:image/image.dart' as img; void main() async { // Load the image final file = File('/Users/robin/Downloads/ou1.png'); final bytes = await file.readAsBytes(); final image = img.decodeImage(bytes); if (image == null) { print("Failed to load image"); return; } // Resize and save images with different methods for (var method in ['area', 'cubic', 'linear', 'average']) { img.Image resized; if (method == 'area' || method == 'linear') { // Use `copyResize` for these, `area` will behave similarly to default resized = img.copyResize(image, width: 310, height: 310); } else if (method == 'cubic') { // Simulate cubic interpolation (approximation) resized = img.copyResize(image, width: 310, height: 310, interpolation: img.Interpolation.cubic); } else if (method == 'average') { resized = img.copyResize(image, width: 310, height: 310, interpolation: img.Interpolation.average); } else { print("Unknown interpolation method: $method"); continue; } // Save the resized image final output = File('$method.bmp'); await output.writeAsBytes(img.encodePng(resized)); print("Saved resized image using $method interpolation as $method.bmp"); } }
The result looks also terrible except just one strategy “average”:
“area” | “cubic” | “linear” | “average” |
So, better use “Interpolation.average” in dart language.
After I installed two versions of JDK (17 and 21) and uninstalled them, I saw this error when trying to launch my Android Studio. This error is hard to fix. Reinstalling Android Studio won’t fix it. And even after I searched google and asked chatGPT and try all the suggestion they give, the problem continued for two days.
The eventual solution is from a StackOverflow article (sorry I forget the url):
# firstly, manually uninstall Android Studio # then rm -rf /Library/Java/JavaVirtualMachines/* # then rm ~/Library/Application\ Support/Google/AndroidStudio* # then rm -rf ~/Library/Caches/ # now install the Android Studio and it could be launched correctly
After training both image classification and sound classification deep learning models. I found out that the image training is much slower than the sound training, although the sound dataset is much bigger than image dataset.
The first idea that jumped out of my mind is that the image has 3 channels (RGB) but the sound spectrogram just have 1 channel. Therefore if I compress RGB into 1 channel (such as using gray image), the training speed of image classification will become 3 times faster.
A few days ago, I started to train the image classification with gray image. But the speed of training is almost the same with RGB image. Until then, I realized how stupid I am.
Let’s see below graph cut from residual network paper. Yes, the first layer is a 7×7 64 filters convoluation, and it will map no matter how many channels just to 64 filters. If the image is a gray one, it maps it to 64 filters; if the image is a RGB one, it also maps it to 64 filters. The computing cost will only reduce 3 times for first layer if I reduce the 3-channels to 1-channel. Compare to total computing cost, this change is quite minor.
That’s why there is no body mentioned about this “accelerating” technology before 🙂
Imaing we have data like this:
WITH Sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price, UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS prod_type, [1.3, 4.2, 2.1, 7.3, 5.3, 9.4] AS prod_price, UNION ALL SELECT 3 AS id, [5, 10] AS prod_type, [1.8, 4.9, 2.0, 7.6, 5.1, 8.4] AS prod_price) select * from sequences
How could I get the total price of each “prod_type” for every “id”?
First we need to unfold the “prod_type” and “prod_price” correspondingly:
WITH Sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price, UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS prod_type, [1.3, 4.2, 2.1, 7.3, 5.3, 9.4] AS prod_price, UNION ALL SELECT 3 AS id, [5, 10] AS prod_type, [1.8, 4.9, 2.0, 7.6, 5.1, 8.4] AS prod_price) SELECT id, prod_type, prod_price from sequences, unnest(prod_type) AS prod_type, unnest(prod_price) AS prod_price;
and then use “group by” to calculate total price:
... SELECT id, prod_type, SUM(prod_price) FROM sequences, UNNEST(prod_type) AS prod_type, UNNEST(prod_price) AS prod_price GROUP BY id, prod_type
All the code is here.
The baseline of training balanced data of AudioSet is 0.27 mAP. Using TimeMasking and FrequentMasking could slightly push it to 0.28 mAP.
I tried mixup of raw sounds like AST but it didn’t improve the mAP totally (the reason is still a myth for me). But, the mixup of fbank filters could push metric to 0.293 mAP.
Until then, the fbank filter will be resized to (384, 384) for model deit_distilled. After I recovered the size of fbank filter to (128, 998), it reached 0.323 mAP.
The most recent (hope it’s not the last) change is copied wholly from AST: use the pretrained parameters of Conv2D from deit_distilled but change the stride size — also expand the position embeddings since the sequence length has changed. The result is 0.333 mAP.
It is worth noting that this is the first time I feel the power of pretrained model by my hand. If I re-initialized parameters of position embedings instead of “bilinear” interpolating it, the result will be far away from 0.333 mAP. Also if I used new initialized parameters of the Conv2D (first layer for Vision Transformer), the result is as bad as before.
I will take care of whehter pretrained model also works well for unbalanced data of AduioSet
I was trying to implement ALBEF by myself for practice. After finishing all the parts (Vision part, BERT part, including Masked Language Model), I trained the model on COCO-Captions/SBU-Captions/CC3M/CC12M dataset (actually more data than the original paper). But the result is quite weird. An old steam train was recognised as a building, and a few fish were recognised as statues.
To solve these weird mistakes, I reviewed the code many times and finally noticed a sentence in the paper:
Although it’s just a normal sentence in the paper, the augmentation could improve the ALBEF model significantly. After randomly cropping the 256×256 raw image to 224×224 and also using the RandAugment, I finally got a more stable and suitable model. Let’s see some examples:
Previously, the fish had been recognised as “shoes”, and the bedroom as “city”. They all become very well after augmentation.
But there are still some interesting bad cases:
Adding a prefix of “A picture of” could help the ALBEF model improve its recognition capability, or actually, there is a lot of text like “A picture of XXX” in the CC3M or CC12m dataset.
Anyhow, I finally implemented and trained a workable ALBEF model by myself, and my RTX-3080Ti card.
To create a pipeline schedule of Vertex AI, we can use below snippet:
from google.cloud import aiplatform pipeline_job = aiplatform.PipelineJob( template_path="COMPILED_PIPELINE_PATH", pipeline_root="PIPELINE_ROOT_PATH", display_name="DISPLAY_NAME", ) pipeline_job_schedule = pipeline_job.create_schedule( display_name="SCHEDULE_NAME", cron="TZ=CRON", max_concurrent_run_count=MAX_CONCURRENT_RUN_COUNT, max_run_count=MAX_RUN_COUNT, service_account="XYZ", )
This Python code runs with service account “XYZ” and we also want the schedule to run as service account “XYZ”. Make sense, right? But the execution throws errors:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "You do not have permission to act as service_account: vertex-runner@pers-decision-engine-dev.iam.gserviceaccount.com. (or it may not exist)." debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.201.95:443 {created_time:"2024-06-06T01:51:02.837225888+00:00", grpc_status:3, grpc_message:"You do not have permission to act as service_account: vertex-runner@pers-decision-engine-dev.iam.gserviceaccount.com. (or it may not exist)."}"
Why does the Python Client of Vertex AI need to “act as” service account “XYZ” even if it’s already using default service account “XYZ”? I can’t answer. Fortunately, the solution is adding a role “Service Account User” to the service account “XYZ” (as this shows)
Seems Google Cloud still need to do a few works to let Vertex AI work very well.
When joined the Junior Middle school in a small town in west-south China in 1993, I met my first English book. Yes, it looks exactly like this:
Then, the terrible 6 years of Chinese-style-English-learning started. For normal kids in poor regions of China, the only way to learn a new language is to REMEMBER IT. For the vocabulary, I remembered all of them by writing them in the draft paper again and again. For the phonogram, I remembered all of them by writing them in the draft paper again and again. For the grammar, I remembered all of them by —- Wait a minute. Why did I need to remember the grammar? Because the English examination will test them. That’s the only purpose of learning English, not to read foreign stories or know the world, but to get a higher examination score.
For the six years of middle school, I spent more than 60 per cent of my hard work time on English (by hardly remembering phonograms and grammar) and still got inferior results: I couldn’t read a long English story, couldn’t recognize a lot of common English words, and couldn’t even write a decent article. All I learned was just some basic English words and some useless grammar.
Only when I went to the University and started to read a 300-page English-Reading-and-Understanding book. Yes, still for the examination, but at least no need to recite those stupid grammar or phonograms. I finally noticed that I could improve my English by just reading books.
The time zips by. I got my Kindle (yes, that electric paper device) in August 2011 (when I was 31 years old) and finished reading my long-desired but first long English story “Jurassic Park”. Since then, I started to read a lot of English books: “The Lost World”, “The Wild Wheel”, “The Swiss Family Robinson”, and all of the best “A Song of Ice and Fire”. I feel happy when reading English books, and my English skills improve as well. Happy learning, that’s the result of reading books.
So my conclusion is: if you want to learn English well, don’t try to remember those boring grammar, just read books, a lot of books.
Doesn’t this sound familiar? Yes, it sounds just like “The Bitter Lesson“, or the Scaling Law for machine learning. A Large Language Model doesn’t need to learn the grammar or go to school. It only needs to read a lot of books and articles (training on a large amount of corpus).
The LLM learns like a human, and I think I can also learn from it: reading a lot is already enough for learning.
I just wrote my implementation of ALBEF in my own way. But when evaluated with some masked sentences, it failed.
I am using this image:
When I asked “This is a chocolate <|mask|>”, it generated “This is a chocolate urn”. Quite strange
Then I asked “This is a <|mask|> cake, it generated “This is a iph cake”. Totally wrong.
After checking my implementation of the dataset, and training on a small part of CC3M, a week passed and I finally got the reason today: the tiktoken is a BPE tokenizer that will use sub-words as tokens and these sub-words severely hurt the model. For example, sub-words “urn” and “iph” appear too many times and the model would use them to replace the masked word in prediction.
By replacing tiktoken with BertTokenizerFast (from “transformers” package), the model correctly generates “This is a chocolate cake”.