Understanding Transformer

In the paper Attention Is All You Need, the Transformer neural network had been introduced for the first time in 2017. One year later, the BERT appeared. And last year I gave a simple presentation in my previous company about the Transformer and BERT. As showed below:

Transformer and BERT from Hao(Robin) Dong

A couple of days before I started to review the Transformer paper and found out that I need to recommend the article The Illustrated Transformer again. This article really helps me to understand a lot of details in the Transformer.

But there is still a question jump out of my brain: what’s the use of decoder in Transformer? How the information flows through encoder to decoder ? After thinking for quite a while, I figured it out: Transformer was used for Machine Translation task at the first place. The encoder is used to “transform” sentence of source language to a couple of Keys and Values; the decoder will “transform” a word of target language to a Query. By using a Query and a couple of Keys and Values, it could get a vector, which is actually the embedding of next word in target language.

Here is a digram draw by me. Hope it could explain my own confusion.

“Ich bin ein guter Kerl” in German means “I am a good guy”. By encoding all German words to a couple of Keys and Values, and decode “good” to a Query, the Transformer could finally output the embedding vector of “guy”.

Robin on Linux

Understanding Transformer

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply