https://www.youtube.com/watch?v=H39Z_720T5s&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=5
Encoders, decoders, encoder-decoders
Encoder accepts text into numerical representations.
the combination of the two parts is known as an encoder-decoder, or seq-seq trnsformer.
https://www.youtube.com/watch?v=MUqNwgPjJvQ&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=6
BERT is a popular encoder.
Welcome to NYC --> Encoder --> [0.1, 0.2], [0.3, 0.1, ...],[0.2, 0.3, ...]
Welcome -> one vector
to -> one vector
NYC -> one vector
"to" includes the around context vectors.
Each word in the initial sequence affects every word's representation.
"self-attention mechanism"
Bi-directional : context from the left, and the right
Good at extracting meaningful information
Sequence classification, question answering, masked language model
Encoders : BERT, RoBERTa, ALBERT
1. Masked Language Modeling
2. Sentiment analysis
https://www.youtube.com/watch?v=d_ixlCubqQw&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=7
Decoder: The decoder creates a feature from text.
The only difference to Encoder is this words can only see the words on their left side; the right side is hidden.
1. Unidirectional : access to their elft context
2. Great at casual tasks; generating sequences
3. Docoders: GPT-2, GPT Neo
My ----> Decoder ----> name
My name ----> is (auto aggressive). use the past output as a input next step.
https://www.youtube.com/watch?v=0_4KEb08xrE&list=PLo2EIpI_JMQvWfQndUesu0nPBAtZ9gP1o&index=8
Encoder - Decoder model.
Decoder accepts the output of Encoder and adding start of sequence word generating Word1 and Word1 becomes an input by auto agressive feature from Decoder. Word2 and Word3 are generated.
Translation scenario:
Welcome to NYC becomes French, output becomes Start of sequence word -> Bievenue -> a -> ..
Encoder understanding the sequence (text to vectors), Decoder receives the vectors and decode the numerical representation, Decoder generating a sequence according to the understanding of the encoder.
Sequence to sequence tasks; many to many; translation , summarization,
Weights are not necessarily shared across the encoder and decoder
Input distribution different form output distribution
Transformers are powerful --> Les Transformers sont puissants
three words of sequence -> four words of sequence
Long sequnece (paragraph) -> short summarised sequence.
Seq to Seq models --> BART, T5
The tokenization pipeline (0) | 2023.09.14 |
---|---|
What happens inside the pipeline function? (TensorFlow) (0) | 2023.09.13 |
What happens inside the pipeline function? (PyTorch) (0) | 2023.09.13 |
What is Transfer Learning (0) | 2023.09.13 |
The pipeline function (0) | 2023.09.13 |