Uploaded by

RAUSHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Week 3 - LLM - PreTraining

Uploaded by

RAUSHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Pre-Training, Fine Tuning

& In-Context Learning

Pre-training
is like a child learning to
read and write his/her mother tongue.

Fine Tuning
is like a student learning to use language to
perform complex tasks in high school and college.

In-Context Learning
is like a working professional trying to
figure out his/her manager’s instructions
Zero Shot vs Few Shot
In-Context Learning (few shot learning)
The LLM Landscape
BERT GPT3 Llama 2
Year 2018 2020 2023
Developer Google OpenAI Meta
Parameters 110 M, 340 M 175 B 7 B, 13 B, 70 B
Architecture Encoder only Decoder only Decoder only
Embedding Size 768 12888 3204
Context Length 512 2048 4000
Tokenization WordPiece BPE SentencePiece
Use Case Classification, NER, Q&A Text Generation Text Generation
The GPT Models

GPT-1 GPT-2 GPT-3

Parameters 117 Million 1.5 Billion 175 Billion
Decoder Layers 12 48 96
Context Token Size 512 1024 2048
Hidden Layer 768 1600 12288
Batch Size 64 512 3.2M
LLM Benchmarks

Benchmark What does it measure?

GLUE Natural Language Understanding
SQuAD Reading Comprehension
HellaSwag Common Sense Inference
ROGUE Text Summarization
RACE Reading Comprehension
BLEU Machine Translation
Perplexity Probability Distribution
METEOR Machine Translation
BERT Pre-Training
ONLY
ENCODER
● Sinusoidal functions
● Learnt from Data
● Rotary Positional Embeddings [RoPE]
● Sinusoidal functions
● Learnt from Data — BERT
● Rotary Positional Embeddings [RoPE] — LLaMA
LLaMA Model
Dollars??
Reinforcement
Learning
with
Human
Feedback

[RLHF]
NON-GRADED TASKS
Write a 400-word blog on the BERT model
that a non-CS person can understand
Teach the concept of word embeddings
and sentence similarity to at least 3 first year students

(without getting into details of the transformer model)

Compute the BERT embedding vectors for the SU chatbot data and:

- Find their PCA components (n=2) and see if they form any clusters.

- Do K-Means clustering of the full embedding vectors

- Compare the results from [CLS] and pooler_output

- Instead of the final layer, use embeddings from intermediate layers

- Make random changes in the model parameters and see its effect

Repeat the above with SBERT (try different models)

Bert Explained
No ratings yet
Bert Explained
8 pages
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
11. Pre-training & LLM 2
No ratings yet
11. Pre-training & LLM 2
46 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
(Shared) - GPT
No ratings yet
(Shared) - GPT
35 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
Week 4 - LLM - FineTuning
No ratings yet
Week 4 - LLM - FineTuning
38 pages
MLSys Class LLM Introduction
No ratings yet
MLSys Class LLM Introduction
43 pages
NLP-week9-fine-tuning_and_IR
No ratings yet
NLP-week9-fine-tuning_and_IR
64 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
12. LLM Prompting & In-Context Learning
No ratings yet
12. LLM Prompting & In-Context Learning
18 pages
Transformer Basics
No ratings yet
Transformer Basics
17 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Vectorstores
No ratings yet
Vectorstores
11 pages
Large Language Model Lifecycle
No ratings yet
Large Language Model Lifecycle
2 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
All about Encoder-Decoder Models
No ratings yet
All about Encoder-Decoder Models
50 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
No ratings yet
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
40 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
Mobilebert: A Compact Task-Agnostic Bert For Resource-Limited Devices
No ratings yet
Mobilebert: A Compact Task-Agnostic Bert For Resource-Limited Devices
13 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
BA-LLMS-W3-S2-2024-2025 - Copy
No ratings yet
BA-LLMS-W3-S2-2024-2025 - Copy
64 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Lakera - Ai-The Ultimate Guide To LLM Fine Tuning Best Practices Amp Tools
100% (1)
Lakera - Ai-The Ultimate Guide To LLM Fine Tuning Best Practices Amp Tools
13 pages
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
From Everand
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
Lorenzo Bettini
4/5 (1)
Ch-4 Pre-trained Models and Fine-tuning
No ratings yet
Ch-4 Pre-trained Models and Fine-tuning
13 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models
No ratings yet
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models
5 pages
lec20.LLM
No ratings yet
lec20.LLM
58 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
From Words To Numbers: Your Large Language Model Is Se-Cretly A Capable Regressor When Given In-Context Examples
No ratings yet
From Words To Numbers: Your Large Language Model Is Se-Cretly A Capable Regressor When Given In-Context Examples
50 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
2506.06266v1
No ratings yet
2506.06266v1
44 pages
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Efficientbert: Progressively Searching Multilayer Perceptron Via Warm-Up Knowledge Distillation
No ratings yet
Efficientbert: Progressively Searching Multilayer Perceptron Via Warm-Up Knowledge Distillation
14 pages
Large Language Model
0% (1)
Large Language Model
38 pages
LLM Fine Tuning
No ratings yet
LLM Fine Tuning
16 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
Mastering Sublime Text
From Everand
Mastering Sublime Text
Dan Peleg
No ratings yet
Slides
No ratings yet
Slides
137 pages
Pagnol: An Extra-Large French Generative Model: Lair - Lighton.Ai/Pagnol
No ratings yet
Pagnol: An Extra-Large French Generative Model: Lair - Lighton.Ai/Pagnol
14 pages
A Simple Guide On Using BERT For Binary Text Classification
No ratings yet
A Simple Guide On Using BERT For Binary Text Classification
18 pages
Large Language Models (LLM)
No ratings yet
Large Language Models (LLM)
139 pages
Perspective Large Languagemodels in Applied Mechanics
No ratings yet
Perspective Large Languagemodels in Applied Mechanics
7 pages
4_instruction finetune llm
No ratings yet
4_instruction finetune llm
5 pages
Model Pretraining
No ratings yet
Model Pretraining
11 pages
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
FDP AI,ML,DL Q5
No ratings yet
FDP AI,ML,DL Q5
2 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Basics of NLP
No ratings yet
Basics of NLP
9 pages
Nlp Lecture 01-16-Plm-tl
No ratings yet
Nlp Lecture 01-16-Plm-tl
11 pages
Bert
No ratings yet
Bert
60 pages
Finetuning Large Language Models - Short course
No ratings yet
Finetuning Large Language Models - Short course
16 pages
Learning Python: Learn to code like a professional with Python - an open source, versatile, and powerful programming language
From Everand
Learning Python: Learn to code like a professional with Python - an open source, versatile, and powerful programming language
Fabrizio Romano
5/5 (2)
Aoop Project
No ratings yet
Aoop Project
23 pages
Vairagar Sonali
No ratings yet
Vairagar Sonali
37 pages
Human Computer Interaction: DR Grace Eden
No ratings yet
Human Computer Interaction: DR Grace Eden
45 pages
Proportional Hydraulis TP701
No ratings yet
Proportional Hydraulis TP701
152 pages
Unit V - WordPress Notesm, K.
No ratings yet
Unit V - WordPress Notesm, K.
17 pages
(Standard) J3061 Automotive Secuirty Requirement
100% (1)
(Standard) J3061 Automotive Secuirty Requirement
15 pages
Adoption of Cloud Computing in Nepal
No ratings yet
Adoption of Cloud Computing in Nepal
11 pages
CS158 1 Reviewer
No ratings yet
CS158 1 Reviewer
8 pages
Inter-Task Communication Using Message Queue: Ex - No.8 Date
No ratings yet
Inter-Task Communication Using Message Queue: Ex - No.8 Date
10 pages
(3361) Assignment 1
No ratings yet
(3361) Assignment 1
5 pages
6 - Security Part I - Auditing Operating Systems and Networks
100% (1)
6 - Security Part I - Auditing Operating Systems and Networks
68 pages
Dzone-Refcard315-Deployment Automation
No ratings yet
Dzone-Refcard315-Deployment Automation
5 pages
ET 421 MC & IoT Lab
No ratings yet
ET 421 MC & IoT Lab
1 page
Sample Final Project Documentation
No ratings yet
Sample Final Project Documentation
87 pages
Unit 5 - Blockchain Application Development
No ratings yet
Unit 5 - Blockchain Application Development
179 pages
Workshop 3.dii1109
No ratings yet
Workshop 3.dii1109
8 pages
Malware Images: Visualization and Automatic Classification: July 2011
No ratings yet
Malware Images: Visualization and Automatic Classification: July 2011
8 pages
Time Table For Summer 2023 Theory Examination
No ratings yet
Time Table For Summer 2023 Theory Examination
1 page
mk275 Wireless Combo Data Sheet
No ratings yet
mk275 Wireless Combo Data Sheet
1 page
Ebook Calculating Roi For Process Automation
No ratings yet
Ebook Calculating Roi For Process Automation
18 pages
Experiment No.7 MicroProcessor PDF
No ratings yet
Experiment No.7 MicroProcessor PDF
7 pages
Computeractive - Issue 636, 20 July 2022
No ratings yet
Computeractive - Issue 636, 20 July 2022
76 pages
Cse Project Thesis
No ratings yet
Cse Project Thesis
14 pages
Coding Adventure Part 1
No ratings yet
Coding Adventure Part 1
196 pages
Viswitha Resume
No ratings yet
Viswitha Resume
2 pages
Venn Diagram BIONOTE AT TALAMBUHAY - Docx - Isaisip Panuto PAGHAHAMBING NG BIONOTE AT TALAMBUHAY Batay Sa Natutunan Mo Sa Dalaw
No ratings yet
Venn Diagram BIONOTE AT TALAMBUHAY - Docx - Isaisip Panuto PAGHAHAMBING NG BIONOTE AT TALAMBUHAY Batay Sa Natutunan Mo Sa Dalaw
1 page
AWS Architecture Icons: Release 11.1-2021.09.21
No ratings yet
AWS Architecture Icons: Release 11.1-2021.09.21
145 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
12 pages
Instructions For Setting Up The Environment For Java Web PRJ301
No ratings yet
Instructions For Setting Up The Environment For Java Web PRJ301
15 pages
EMP & DEPT tables with SQL
No ratings yet
EMP & DEPT tables with SQL
2 pages