0% found this document useful (0 votes)
15 views

Week 3 - LLM - PreTraining

Uploaded by

RAUSHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Week 3 - LLM - PreTraining

Uploaded by

RAUSHAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Pre-Training, Fine Tuning

& In-Context Learning


Pre-training
is like a child learning to
read and write his/her mother tongue.

Fine Tuning
is like a student learning to use language to
perform complex tasks in high school and college.

In-Context Learning
is like a working professional trying to
figure out his/her manager’s instructions
Zero Shot vs Few Shot
In-Context Learning (few shot learning)
The LLM Landscape
BERT GPT3 Llama 2
Year 2018 2020 2023
Developer Google OpenAI Meta
Parameters 110 M, 340 M 175 B 7 B, 13 B, 70 B
Architecture Encoder only Decoder only Decoder only
Embedding Size 768 12888 3204
Context Length 512 2048 4000
Tokenization WordPiece BPE SentencePiece
Use Case Classification, NER, Q&A Text Generation Text Generation
The GPT Models

GPT-1 GPT-2 GPT-3


Parameters 117 Million 1.5 Billion 175 Billion
Decoder Layers 12 48 96
Context Token Size 512 1024 2048
Hidden Layer 768 1600 12288
Batch Size 64 512 3.2M
LLM Benchmarks

Benchmark What does it measure?


GLUE Natural Language Understanding
SQuAD Reading Comprehension
HellaSwag Common Sense Inference
ROGUE Text Summarization
RACE Reading Comprehension
BLEU Machine Translation
Perplexity Probability Distribution
METEOR Machine Translation
BERT Pre-Training
ONLY
ENCODER
● Sinusoidal functions
● Learnt from Data
● Rotary Positional Embeddings [RoPE]
● Sinusoidal functions
● Learnt from Data — BERT
● Rotary Positional Embeddings [RoPE] — LLaMA
LLaMA Model
Dollars??
Reinforcement
Learning
with
Human
Feedback

[RLHF]
NON-GRADED TASKS
Write a 400-word blog on the BERT model
that a non-CS person can understand
Teach the concept of word embeddings
and sentence similarity to at least 3 first year students

(without getting into details of the transformer model)


Compute the BERT embedding vectors for the SU chatbot data and:

- Find their PCA components (n=2) and see if they form any clusters.

- Do K-Means clustering of the full embedding vectors

- Compare the results from [CLS] and pooler_output

- Instead of the final layer, use embeddings from intermediate layers

- Make random changes in the model parameters and see its effect

Repeat the above with SBERT (try different models)

You might also like