0% found this document useful (0 votes)
59 views

1. Deep Learning

Uploaded by

yitej21617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

1. Deep Learning

Uploaded by

yitej21617
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Introduction to

Deep Learning
Agenda

❑ What is AI, ML, DL?


❑ Real-World DL
❑ Types of Artificial Intelligence
❑ DL Basics
❑ DL Challenges
❑ NLP
❑ Computer Vision
Artificial Intelligence
What is AI?

artificial intelligence (AI), the


ability of a digital computer or
computer-controlled robot to perform
tasks commonly associated with
intelligent beings.
“AI began with an ancient wish to forge the gods.”
- Pamela McCorduck, Machines Who Think, 1979
Frankenstein (1818)

Ex Machina (2015)

Visualized here are 3% of the neurons and 0.0001% of the synapses in the brain.
Thalamocortical system visualization via DigiCortex Engine.
https://deeplearning.mit.edu 2019
For the full list of references visit:
https://hcai.mit.edu/references
[286]
History of Deep Learning Ideas and Milestones*
• 1943: Neural networks
We are here
• 1957: Perceptron
• 1974-86: Backpropagation, RBM, RNN
• 1989-98: CNN, MNIST, LSTM, Bidirectional RNN
• 2006: “Deep Learning”, DBN
• 2009: ImageNet
Perspective:
• 2012: AlexNet, Dropout
• Universe created
13.8 billion years ago • 2014: GANs
• Earth created • 2014: DeepFace
4.54 billion years ago
• Modern humans • 2016: AlphaGo
300,000 years ago
• 2017: AlphaZero, Capsule Networks
• Civilization
12,000 years ago • 2018: BERT
• Written record * Dates are for perspective and not as definitive historical
5,000 years ago record of invention or credit

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
History of DL Tools*
• Mark 1 Perceptron – 1960
• Torch – 2002
• CUDA – 2007
• Theano – 2008
• Caffe – 2014
• DistBelief – 2011
• TensorFlow 0.1 – 2015
• PyTorch 0.1 – 2017
• TensorFlow 1.0 – 2017
• PyTorch 1.0 – 2017
• TensorFlow 2.0 – 2019

* Truncated for clarity over completeness


For the full list of references visit:
https://hcai.mit.edu/references https://deeplearning.mit.edu
Neuron: Biological Inspiration for Computation
(Artificial) Neuron: computational building
block for the “neural network”

Neuron: computational building


block for the brain

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[18, 143] https://deeplearning.mit.edu 2019
Biological and Artificial Neural Networks

Human Brain
• Thalamocortical system:
3 million neurons
476 million synapses
• Full brain:
100 billion neurons
1,000 trillion synapses

Artificial Neural Network


• ResNet-152:
60 million synapses

Human brains have ~10,000,000 times synapses


than artificial neural networks.
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[286] https://deeplearning.mit.edu 2019
Neuron: Biological Inspiration for Computation
Key Difference:
• Parameters: Human brains have
~10,000,000 times synapses than
artificial neural networks.
• Topology: Human brains have no
“layers”. Async: The human brain works
• Neuron: computational asynchronously, ANNs work
building block for the brain synchronously.
• Learning algorithm: ANNs use gradient
descent for learning. We don’t know
what human brains use
• Power consumption: Biological neural
networks use very little power
compared to artificial networks
• Stages: Biological networks usually
never stop learning. ANNs first train
• (Artificial) Neuron: computational then test.
building block for the “neural network”
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[18, 143] https://deeplearning.mit.edu 2019
Neuron: Forward Pass

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[78] https://deeplearning.mit.edu
Combing Neurons in Hidden Layers:
The “Emergent” Power to Approximate

Universality: For any arbitrary function f(x), there exists a neural


network that closely approximate it for any input x
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[62] https://deeplearning.mit.edu
Neural Networks are Parallelizable
Step 1 Step 4

Step 2 Step 5

Step 3 Animated

For the full list of references visit:


https://hcai.mit.edu/references
[273] https://deeplearning.mit.edu
Compute Hardware

• CPU – serial, general purpose, everyone has one


• GPU – parallelizable, still general purpose
• TPU – custom ASIC (Application-Specific Integrated Circuit) by
Google, specialized for machine learning, low precision

For the full list of references visit:


https://hcai.mit.edu/references
[273] https://deeplearning.mit.edu
Real-World Deep Learning
Autonomous and semi-autonomous cars
Real-World Deep Learning
Chat-GPT
Neural Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity
diminished in late 90s.
Recent resurgence: State-of-the-art technique for many
applications
Neuron in the brain
“input wires”

“output wires”
Neurons in the brain

[Credit: US National Institutes of Health, National Institute on Aging]


DEEP LEARNING

Deep learning is a specific subfield of machine learning: a new take on


learning representations from data that puts an emphasis on learning
successive layers of increasingly meaningful representations. The deep
in deep learning isn’t a reference to any kind of deeper understanding
achieved by the approach; rather, it stands for this idea of successive layers
of representations. How many layers contribute to a model of the data is
called the depth of the model. Other appropriate names for the field could
have been layered representations learning and hierarchical
representations learning. Modern deep learning often involves tens or even
hundreds of successive layers of representations—and they’re all learned
automatically from exposure to training data. Meanwhile, other approaches
to machine learning tend to focus on learning only one or two layers of
representations of the data; hence, they’re sometimes called shallow
learning. In deep learning, these layered representations are (almost always)
learned via models called neural networks, structured in literal layers
stacked on top of each other.
AI-ML-DL
RELATIONSHIP
GEOMETRIC INTERPRETATION
OF DL
Neural networks consist entirely of chains of tensor
(generalized matrix) operations and that all of these tensor
operations are just geometric transformations of the
input data.
It follows that you can interpret a neural network as a very
complex geometric transformation in a high-dimensional
space, implemented via a long series of simple steps.
In 3D, the following mental image may prove useful. Imagine
two sheets of colored paper: one red and one blue. Put one on
top of the other. Now crumple them together into a small ball.
That crumpled paper ball is your input data, and each sheet of
paper is a class of data in a classification problem. What a
neural network (or any other machine-learning model) is meant
to do is figure out a transformation of the paper ball that would
uncrumple it, so as to make the two classes cleanly separable
again. With deep learning, this would be implemented as a
series of simple transformations of the 3D space, such as
those you could apply on the paper ball with your fingers, one
movement at a time.
Deep Learning is Representation Learning
(aka Feature Learning)

Deep
Learning

Representation
Learning

Machine
Learning

Artificial
Intelligence

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[20] https://deeplearning.mit.edu
Representation Matters

Task: Draw a line to separate the green triangles and blue circles.

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[20] https://deeplearning.mit.edu
Deep Learning is Representation Learning
(aka Feature Learning)

Task: Draw a line to separate the blue curve and red curve

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[146] https://deeplearning.mit.edu 2019
Representation Matters

Sun-Centered Model Earth-Centered Model


(Formalized by Copernicus in 16th century)

“History of science is the history of compression progress.”


- Jürgen Schmidhuber
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[20] https://deeplearning.mit.edu 2019
Why Deep Learning? Scalable Machine Learning

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[283, 284] https://deeplearning.mit.edu 2019
Gartner Hype Cycle

Deep Learning
Self-Driving Cars

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
© deeplearning.ai
Andrew Ng
Intuition about deep
representation

𝑦

© deeplearning.ai
Why Deep Learning and Why
Now?
https://www.prowesscorp.com/whats-the-difference-between-artificial-intelligence-ai-machine-learning-and-deep-learning/
Machine Learning definition

Arthur Samuel (1959). Machine Learning: Field of study that gives


computers the ability to learn without being explicitly
programmed.
Machine Learning definition

Herbert Simon.
Learning is any process by which a system improves performance
from experience.
Machine Learning is concerned with computer
programs that automatically improve their
performance through experience.
Machine Learning definition

Tom Mitchell (1998). A computer program is said to learn from


experience E with respect to some task T and some performance
measure P, if its performance on T, as measured by P, improves
with experience E.
Example: Spam Filtering
Question.
Learning to detect credit card fraud.
What are T, P and E?

Task T: Assign label of fraud or not fraud to credit card transaction


Performance measure P: Accuracy of fraud classifier
Training experience E: Historical credit card transactions labeled as
fraud or not
Question.
Suppose we feed a learning algorithm a lot of historical weather
data, and have it learn to predict weather. What are E, P and T?
Key ML Terminology

Labels
A label is the thing we're predicting.
Features
A feature is an input variable
Examples or samples
An example is a particular instance of data, x.
labeled examples. A labeled example includes both feature(s) and the label.
unlabeled examples. An unlabeled example contains features but not the
label.
Models
A model defines the relationship between features and label.
Training means creating or learning the model.
Inference means applying the trained model to unlabeled examples.
Key types of Machine Learning problems

Supervised machine learning: Learn to predict target values from labelled data/

Classification (target values are discrete classes)


Regression (target values are continuous values)

Unsupervised machine learining: Find structure in unlabeled data

Find groups of similar instances in the data (clusterin)


Finding unusual patterns (outlier detection)
Supervised learning Unsupervised learning

Training set: Training set:


Applications of clustering

Market segmentation Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Organize computing Astronomical data analysis


clusters
Supervised or Unsupervised?

•Examine the statistics of two football teams, and predicting which team will
win tomorrow's match (given historical data of teams' wins/losses to learn
from).

•This can be addressed using supervised learning, in which we learn from


historical records to make win/loss predictions.
Supervised or Unsupervised?

Take a collection of 1000 customers of ASAN Pay, and find a way to


automatically group these customers into a small number of groups of
customers that are somehow "similar" or "related".

This is an unsupervised learning/clustering problem.


Housing price prediction.
400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500
Size in feet2

Supervised Learning Regression: Predict continuous


“right answers” given valued output (price)
Classification or Regression?

The amount of rain that falls in a day is usually measured


in either millimeters (mm) or inches. Suppose you use a
learning algorithm to predict how much rain will fall
tomorrow. Would you treat this as a classification or a
regression problem?
Regression is appropriate when we are trying to predict a
continuous-valued output, such as the amount of rainfall
measured in inches or mm.
Classification or Regression?

Suppose you are working on weather prediction, and your


weather station makes one of three predictions for each
day's weather: Sunny, Cloudy or Rainy. You'd like to use a
learning algorithm to predict tomorrow's weather. Would
you treat this as a classification or a regression problem?
Classification is appropriate when we are trying to predict
one of a small number of discrete-valued outputs, such as
whether it is Sunny (which we might designate as class 0),
Cloudy (say class 1) or Rainy (class 2).
Regression vs Classification

For the full list of references visit:


https://hcai.mit.edu/references
[288] https://deeplearning.mit.edu
Multi-Class vs Multi-Label

For the full list of references visit:


https://hcai.mit.edu/references
[288] https://deeplearning.mit.edu
Types of Artificial Intelligence

Types of Artificial Intelligence:


➢ Narrow AI
➢ General AI
➢ Super AI
Types of Artificial Intelligence
Narrow AI:
➢ It is also known as a weak AI
➢ Only narrowly defined special tasks can be
performed
➢ The machine has no thinking ability
➢ It performs a set of predetermined functions.
Types of Artificial Intelligence
General AI
• General AI is a type of intelligence which could perform
any intellectual task with efficiency like a human.
• Elon Musk and a group of artificial intelligence experts and
industry executives are calling for a six-month pause in
developing systems more powerful than OpenAI's newly
launched GPT-4, in an open letter citing potential risks to
society
General AI
➢ Hawking cautioned against an extreme form of AI
➢ Thinking machines would “take-off” on their own,
modifying themselves and independently designing
and building ever more capable systems.
➢ Humans, bound by the slow pace of biological
evolution, would be tragically outwitted.
Super AI
Super AI
➢ Super AI is a level of Intelligence of Systems
at which machines could surpass human
intelligence, and can perform any task better
than human with cognitive properties
➢ Currently, super AI does not exist
Artificial Intelligence Quiz

Check: Which of the following can AI do now?


Can play a millionaire game?
Can win anyone in Chess?
Can win any person in a GO game?
Can play table tennis?
Can take the glass and put it in the closet?
Can fully replace a person in housework?
Can drive safely in the highway?
Can drive in 20 Yanvar?
Can do weekly shopping?
Can he fully translate from one language to another?
Can discover mathematical theory?
Can perform heart surgery?
Can read a person’s brain?
Can determine whether the given feedback is negative or positive?
Can write a funny story?
NLP and Its Applications
Speech to Text and Text to Speech
➢ Speech recognition
➢ Text-to-speech synthesis (TTS)

Natural Language Processing


➢ Question Answering Systems
➢ Chatbots
➢ Machine Translation
➢ Surfing on the Web
➢ Text classification
➢ Content categorization.

General Purpose NLP:


➢ GPT-3 OpenAI:
https://www.youtube.com/watch?v=r2dQgdktUJg
➢ Jukebox by OpenAI:
https://openai.com/blog/jukebox/
NLP and Its role in Digital Inclusion
Personal Assistants for Digital Inclusion
Computer Vision and Its Applications

Image Segmentation

https://phenaki.github.io/
Image to Text Image Generation
VisionEye
With object detection and voice delivery systems,
the project aims to help individuals with visual
impairments navigate easily and safely. Our
students promote an inclusive society where
everyone has access to the resources that they
need.
Computer Vision and Its role in Digital Inclusion
Computer Vision and Its role in Digital Inclusion
Autonomous Car Driving
Path
Planning

Laser Terrain
Mapping

Learning from Human Drivers


Adaptive Vision

Sebastian

Stanley
Face Recognition

object models

object parts
(combination
of edges)

edges

pixels
Image Segmentation
Deep Learning in One Slide

• What is it: Exciting progress:


Extract useful patterns from data.
• Face recognition
• How:
• Image classification
Neural network + optimization
• Speech recognition
• How (Practical):
Python + TensorFlow & friends • Text-to-speech generation
• Hard Part: • Handwriting transcription
Good Questions + Good Data • Machine translation
• Why now: • Medical diagnosis
Data, hardware, community, tools,
investment • Cars: drivable area, lane keeping
• Where do we stand? • Digital assistants
Most big questions of intelligence • Ads, search, social recommendations
have not been answered nor
properly formulated • Game playing with deep RL
First Steps: Start Simple
1

Input Image:

TensorFlow Neural 5
Model: Network
6

Output: 5
(with 87% confidence)

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Why Deep Learning? Real World Applications

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Why Not Deep Learning? Unintended
Consequences
Human AI (Deep RL Agent)

Player gets reward based on:


1. Finishing time
2. Finishing position
3. Picking up “turbos”

For the full list of references visit:


https://hcai.mit.edu/references
[285] https://deeplearning.mit.edu
The Challenge of Deep Learning
• Ask the right question and know what the answer means:
image classification ≠ scene understanding

• Select, collect, and organize the right data to train on:


photos ≠ synthetic ≠ real-world video frames

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Pure Perception is Hard

For the full list of references visit:


https://hcai.mit.edu/references
[66] https://deeplearning.mit.edu
Visual Understanding is Harder

Examples of what we can’t do well:


• Mirrors
• Sparse information
• 3D Structure
• Physics
• What’s on
peoples’ minds?
• What happens next?
• Humor

For the full list of references visit:


https://hcai.mit.edu/references
[211] https://deeplearning.mit.edu
Deep Learning:
Our intuition about what’s “hard” is flawed (in complicated ways)

Visual perception: 540,000,000 years of data


Bipedal movement: 230,000,000 years of data
Abstract thought: 100,000 years of data

Prediction: Dog + Distortion Prediction: Ostrich

“Encoded in the large, highly evolve sensory and motor portions of the human brain is a billion
years of experience about the nature of the world and how to survive in it.… Abstract thought,
though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It
is not all that intrinsically difficult; it just seems so when we do it.”
- Hans Moravec, Mind Children (1988)
For the full list of references visit:
https://hcai.mit.edu/references
[6, 7, 11, 68] https://deeplearning.mit.edu
Measuring Progress: Einstein vs Savant

Max Tegmark’s rising sea visualization of


Hans Moravec’s landscape of human competence
For the full list of references visit:
https://hcai.mit.edu/references
[281] https://deeplearning.mit.edu
Special Purpose Intelligence:
Estimating Apartment Cost

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[65] https://deeplearning.mit.edu
(Toward) General Purpose Intelligence:
Pong to Pixels
Policy Network:

• 80x80 image (difference image)


• 2 actions: up or down
• 200,000 Pong games

This is a step towards general purpose


artificial intelligence!
Andrej Karpathy. “Deep Reinforcement
Learning: Pong from Pixels.” 2016.

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[63] https://deeplearning.mit.edu
Deep Learning from Human and Machine
“Teachers” “Students”
Supervised
Human
Learning

Human Augmented
Supervised
Machine Learning

Human Semi-
Supervised
Machine Learning

Human Reinforcement
Machine Learning

Machine Unsupervised
Learning

https://deeplearning.mit.edu 2019
Data Augmentation
Crop: Flip:

Scale: Rotate:

Noise:
Translation:

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[294] https://deeplearning.mit.edu 2019
The Challenge of Deep Learning:
Efficient Teaching + Efficient Learning
• Humans can learn from very few examples
• Machines (in most cases) need thousands/millions of examples

For the full list of references visit:


https://hcai.mit.edu/references [291] https://deeplearning.mit.edu 2019
Deep Learning: Training and Testing

Training Stage:

Input Learning Correct


Data System Output
(aka “Ground Truth”)

Testing Stage:

New Input Learning


Best Guess
Data System

https://deeplearning.mit.edu
How Neural Networks Learn: Backpropagation

Forward Pass:

Input Neural
Prediction
Data Network

Backward Pass (aka Backpropagation):

Neural Measure
Network of Error
Adjust to Reduce Error

https://deeplearning.mit.edu
What can we do with Deep Learning?

Input Learning Correct


Data System Output

• Number • Number
• Vector of numbers • Vector of numbers
• Sequence of numbers • Sequence of numbers
• Sequence of vectors of numbers • Sequence of vectors of numbers

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Key Concepts:
Activation Functions
Sigmoid
• Vanishing gradients
• Not zero centered

Tanh
• Vanishing gradients

ReLU
• Not zero centered

For the full list of references visit:


https://hcai.mit.edu/references
[148] https://deeplearning.mit.edu
Loss Functions

• Loss function quantifies gap between


prediction and ground truth
• For regression:
• Mean Squared Error (MSE)
• For classification:
• Cross Entropy Loss

Mean Squared Error Cross Entropy Loss

Prediction Classes Prediction

Ground Truth Ground Truth {0,1}

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Backpropagation

Task: Update the weights and biases to decrease loss function

Subtasks:
1. Forward pass to compute network output and “error”
2. Backward pass to compute gradients
3. A fraction of the weight’s gradient is subtracted from the weight.

Learning Rate Numerical Method: Automatic Differentiation


For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[63, 80, 100] https://deeplearning.mit.edu 2019
Learning is an Optimization Problem

Task: Update the weights and biases to decrease loss function

SGD: Stochastic Gradient Descent

References: [103] https://deeplearning.mit.edu 2019


Dying ReLUs
Vanishing Gradients:

• If a neuron is initialized poorly, it might not fire for


entire training dataset.
• Large parts of your network could be dead ReLUs! Partial derivatives are small = Learning is slow

Hard to break symmetry Vanilla SGD gets your there, but can be slow

References: [102, 104] https://deeplearning.mit.edu 2019


Mini-Batch Size

Mini-Batch size: Number of training instances the network


evaluates per weight update step.
• Larger batch size = more computational speed
• Smaller batch size = (empirically) better generalization

“Training with large minibatches is bad for your health. More importantly, it's
bad for your test error. Friends don’t let friends use minibatches larger than 32.”
- Yann LeCun
Revisiting Small Batch Training for Deep Neural Networks (2018)

For the full list of references visit:


https://hcai.mit.edu/references [329] https://deeplearning.mit.edu 2019
Overfitting and Regularization

• Help the network generalize to data it hasn’t seen.


• Big problem for small datasets.
• Overfitting example (a sine curve vs 9-degree polynomial):

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[24, 20, 140] https://deeplearning.mit.edu
Overfitting and Regularization

• Overfitting: The error decreases in the training set but


increases in the test set.

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[24, 20, 140] https://deeplearning.mit.edu
Regularization: Early Stoppage

• Create “validation” set (subset of the training set).


• Validation set is assumed to be a representative of the testing set.
• Early stoppage: Stop training (or at least save a checkpoint)
when performance on the validation set decreases

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[20, 140] https://deeplearning.mit.edu
Regularization: Dropout

• Dropout: Randomly remove some nodes in the network (along


with incoming and outgoing edges)
• Notes:
• Usually p >= 0.5 (p is probability of keeping node)
• Input layers p should be much higher (and use noise instead of dropout)
• Most deep learning frameworks come with a dropout layer

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[20, 140] https://deeplearning.mit.edu
Regularization: Weight Penalty (aka Weight Decay)

• L2 Penalty: Penalize squared weights. Result:


• Keeps weight small unless error derivative is
very large.
• Prevent from fitting sampling error.
• Smoother model (output changes slower as
the input change).
• If network has two similar inputs, it prefers to
put half the weight on each rather than all the
weight on one.

• L1 Penalty: Penalize absolute weights. Result:


• Allow for a few weights to remain large.

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[20, 140, 147] https://deeplearning.mit.edu 2019
Normalization

• Network Input Normalization


• Example: Pixel to [0, 1] or [-1, 1] or according to mean and std.

• Batch Normalization (BatchNorm, BN)


• Normalize hidden layer inputs to mini-batch mean & variance
• Reduces impact of earlier layers on later layers

• Batch Renormalization (BatchRenorm, BR)


• Fixes difference b/w training and inference by keeping a moving
average asymptotically approaching a global normalization.

• Other options:
• Layer normalization (LN) – conceived for RNNs
• Instance normalization (IN) – conceived for Style Transfer
• Group normalization (GN) – conceived for CNNs

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[289, 290] https://deeplearning.mit.edu 2019
Neural Network Playground
http://playground.tensorflow.org

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[154] https://deeplearning.mit.edu 2019
Convolutional Neural Networks: Image
Classification

• Convolutional filters:
take advantage of
spatial invariance

For the full list of references visit:


https://hcai.mit.edu/references [293] https://deeplearning.mit.edu 2019
• AlexNet (2012): First CNN (15.4%)
• 8 layers
• 61 million parameters
• ZFNet (2013): 15.4% to 11.2%
• 8 layers
• More filters. Denser stride.
• VGGNet (2014): 11.2% to 7.3%
• Beautifully uniform:
3x3 conv, stride 1, pad 1, 2x2 max pool
• 16 layers
• 138 million parameters
• GoogLeNet (2014): 11.2% to 6.7%
• Inception modules
• 22 layers
• 5 million parameters
(throw away fully connected layers)
• ResNet (2015): 6.7% to 3.57%
• More layers = better performance
Human error (5.1%) • 152 layers
surpassed in 2015 • CUImage (2016): 3.57% to 2.99%
• Ensemble of 6 models
• SENet (2017): 2.99% to 2.251%
• Squeeze and excitation block: network
is allowed to adaptively adjust the
weighting of each feature map in the
convolutional block.

References: [90] https://deeplearning.mit.edu 2019


Object Detection / Localization
Region-Based Methods | Shown: Faster R-CNN

For the full list of references visit:


https://hcai.mit.edu/references
[299] https://deeplearning.mit.edu
Object Detection / Localization
Single-Shot Methods | Shown: SSD

For the full list of references visit:


https://hcai.mit.edu/references
[299] https://deeplearning.mit.edu
Semantic Segmentation

For the full list of references visit:


https://hcai.mit.edu/references
[175] https://deeplearning.mit.edu
Transfer Learning

• Fine-tune a pre-trained model


• Effective in many applications: computer vision, audio, speech,
natural language processing

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Autoencoders

• Unsupervised learning
• Gives embedding
• Typically better embeddings
come from discriminative task

http://projector.tensorflow.org/
For the full updated list of references visit:
https://selfdrivingcars.mit.edu/references
[298] https://deeplearning.mit.edu 2019
Generative Adversarial Network (GANs)

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[302, 303, 304] https://deeplearning.mit.edu 2019
Word Embeddings (Word2Vec)

Skip Gram Model:

Word Vector

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[297] https://deeplearning.mit.edu 2019
Recurrent Neural Networks

• Applications
• Sequence Data
• Text
• Speech
• Audio
• Video
• Generation

For the full list of references visit:


https://hcai.mit.edu/references
[299] https://deeplearning.mit.edu
Long-Term Dependency

• Short-term dependence:
Bob is eating an apple.
Context • Long-term dependence:
Bob likes apples. He is hungry and decided to
have a snack. So now he is eating an apple.
In theory, vanilla RNNs
can handle arbitrarily
long-term dependence.

In practice, it’s difficult.

For the full list of references visit:


https://hcai.mit.edu/references
[109] https://deeplearning.mit.edu
Long Short-Term Memory (LSTM) Networks: Pick
What to Forget and What To Remember

Conveyer belt for previous state and new data:


1. Decide what to forget (state)
2. Decide what to remember (state)
3. Decide what to output (if anything)

For the full list of references visit:


https://hcai.mit.edu/references
[109] https://deeplearning.mit.edu
Bidirectional RNN

• Learn representations from both previous time


steps and future time steps

For the full list of references visit:


https://hcai.mit.edu/references
[109] https://deeplearning.mit.edu
Encoder-Decoder Architecture

Encoder RNN encodes input sequence into a fixed size vector,


and then is passed repeatedly to decoder RNN.

For the full list of references visit:


https://hcai.mit.edu/references https://deeplearning.mit.edu
Attention

Attention mechanism allows the network to refer back to the


input sequence, instead of forcing it to encode all information
into one fixed-length vector.
For the full list of references visit:
https://hcai.mit.edu/references https://deeplearning.mit.edu
AutoML and Neural Architecture Search (NASNet)

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[300, 301] https://deeplearning.mit.edu
Deep Reinforcement Learning

For the full updated list of references visit:


https://selfdrivingcars.mit.edu/references
[306, 307] https://deeplearning.mit.edu
Toward Artificial General Intelligence
• Transfer Learning
• Hyperparameter Optimization
• Architecture Search
• Meta Learning

For the full list of references visit:


https://hcai.mit.edu/references
[286, 291] https://deeplearning.mit.edu 2019
Reading material:

Chapter 1 - Zhang, Aston & Lipton, Zachary & Li, Mu & Smola,
Alexander. (2023). Dive into Deep Learning, Cambridge University
Press.
https://d2l.ai/chapter_introduction/index.html

Chapter 1 – Deep Learning for Coders with Fastai and PyTorch: AI


Applications Without a PhD, 1st edition, 2020
https://colab.research.google.com/github/fastai/fastbook/blob/
master/01_intro.ipynb

You might also like