The Transformer architecture

Hugging Face Course

by Taeyoon.Kim.DS 2023. 9. 13. 19:14

https://www.youtube.com/watch?v=H39Z_720T5s&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=5

Encoders, decoders, encoder-decoders

Encoder accepts text into numerical representations.

the combination of the two parts is known as an encoder-decoder, or seq-seq trnsformer.

https://www.youtube.com/watch?v=MUqNwgPjJvQ&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=6

BERT is a popular encoder.

Welcome to NYC --> Encoder --> [0.1, 0.2], [0.3, 0.1, ...],[0.2, 0.3, ...]
Welcome -> one vector
to -> one vector
NYC -> one vector
"to" includes the around context vectors.
Each word in the initial sequence affects every word's representation.

"self-attention mechanism"

Bi-directional : context from the left, and the right

Good at extracting meaningful information

Sequence classification, question answering, masked language model

Encoders : BERT, RoBERTa, ALBERT

1. Masked Language Modeling

2. Sentiment analysis

https://www.youtube.com/watch?v=d_ixlCubqQw&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=7

Decoder: The decoder creates a feature from text.

The only difference to Encoder is this words can only see the words on their left side; the right side is hidden.

1. Unidirectional : access to their elft context

2. Great at casual tasks; generating sequences

3. Docoders: GPT-2, GPT Neo

My ----> Decoder ----> name

My name ----> is (auto aggressive). use the past output as a input next step.

https://www.youtube.com/watch?v=0_4KEb08xrE&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=8

Encoder - Decoder model.
Decoder accepts the output of Encoder and adding start of sequence word generating Word1 and Word1 becomes an input by auto agressive feature from Decoder. Word2 and Word3 are generated.

Translation scenario:

Welcome to NYC becomes French, output becomes Start of sequence word -> Bievenue -> a -> ..

Encoder understanding the sequence (text to vectors), Decoder receives the vectors and decode the numerical representation, Decoder generating a sequence according to the understanding of the encoder.

Sequence to sequence tasks; many to many; translation , summarization,

Weights are not necessarily shared across the encoder and decoder

Input distribution different form output distribution

Transformers are powerful --> Les Transformers sont puissants
three words of sequence -> four words of sequence
Long sequnece (paragraph) -> short summarised sequence.

Seq to Seq models --> BART, T5

저작자표시 비영리 변경금지

'Hugging Face Course' 카테고리의 다른 글

The tokenization pipeline (0)	2023.09.14
What happens inside the pipeline function? (TensorFlow) (0)	2023.09.13
What happens inside the pipeline function? (PyTorch) (0)	2023.09.13
What is Transfer Learning (0)	2023.09.13
The pipeline function (0)	2023.09.13