0% found this document useful (0 votes)
201 views28 pages

Dense Net

The document describes DenseNet, a convolutional neural network architecture where each layer is directly connected to every other layer in a feed-forward fashion. DenseNet uses dense blocks where the output of each layer is concatenated with the outputs of preceding layers. This facilitates strong gradient flow and parameter efficiency. The architecture achieved state-of-the-art results on CIFAR, SVHN and ImageNet datasets using relatively fewer parameters than standard convolutional networks.

Uploaded by

Fahad Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
201 views28 pages

Dense Net

The document describes DenseNet, a convolutional neural network architecture where each layer is directly connected to every other layer in a feed-forward fashion. DenseNet uses dense blocks where the output of each layer is concatenated with the outputs of preceding layers. This facilitates strong gradient flow and parameter efficiency. The architecture achieved state-of-the-art results on CIFAR, SVHN and ImageNet datasets using relatively fewer parameters than standard convolutional networks.

Uploaded by

Fahad Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

DENSELY CONNECTED CONVOLUTIONAL NETWORKS

Presentation by :

MariaWaheed ( l1f18bscs0460)

Farrukh Alam Virk ( l1f18bscs0424)


WHAT ARE COVERED IN THIS PRESENTATION

 Dense Block
 DenseNet Architecture
 Advantages of DenseNet
 CIFAR & SVHN Small-scale Dataset Results
 ImageNet Large-Scale Dataset Results
 Further Analysis on Feature Reuse
STANDARD
CONNECTIVITY

Dense Block:
A Dense Block is a module used in convolutional neural networks that connects all
layers (with matching feature-map sizes) directly with each other. To preserve the feed-
forward nature, each layer obtains additional inputs from all preceding layers and passes
on its own feature-maps to all subsequent layers.
In Standard ConvNet, input image goes through multiple convolution and obtain high-level
features.
R E S NET CONNECTIV ITY
Identity mappings promote gradient
propagation.

: E lement-wise addition

In ResNet, identity mapping is proposed to promote the gradient propagation. Element-wise addition is used.
It can be viewed as algorithms with a state passed from one ResNet module to another one.
DE NSE ARCHITECTURE
DE NSE
CONNECTIVITY

C C C C

C : Channel-wise
concatenation
In DenseNet, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all
subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
DE NSE AND S LIM

C C C C

k channels k channels k channels k channels


k : Growth
Rate
Since each layer receives feature maps from all preceding layers, network can
be thinner and compact, i.e. number of channels can be fewer.
The growth rate k is the additional number of channels for each layer.
FORWARD
PROPAGATION
x1 h x2 x3 x4 x
x0 h 1 x0 2 h3 h x2
1 2
x xx1 0
x x 4 3
x 1
x
0
0

So, it have higher computational efficiency and memory efficiency. The following figure shows the
concept of concatenation during forward propagation
DenseNet Architecture:
Basic DenseNet Composition Layer:
For each composition layer, Pre-Activation Batch Norm (BN) and ReLU, then 3×3 Conv are done with output feature maps
of k channels, say for example, to transform x0, x1, x2, x3 to x4. This is the idea from Pre-Activation ResNet.

Convolution (3x3)
Batch Norm
x3x x4
x1

ReL
3
0 xx1

U
2 x 0
x 2
x
k
channels
x5 =h5([x0, …, x4])
DenseNet-B (Bottleneck Layers):
To reduce the model complexity and size, BN-ReLU-1×1 Conv is done before BN-ReLU-3×3 Conv.

Convolution (1x1)

Convolution (3x3)
x4
Batch Norm

Batch Norm
x3x

ReL

ReL
x 1x

U
2
0

lxk 4xk k
channels channels channels
Higher parameter and computational
efficiency
MULTIPLE DENSE BLOCKS WITH TRANSITION LAYERS:
1×1 CONV FOLLOWED BY 2×2 AVERAGE POOLING ARE USED AS THE TRANSITION LAYERS BETWEEN TWO
CONTIGUOUS DENSE BLOCKS.

FEATURE MAP SIZES ARE THE SAME WITHIN THE DENSE BLOCK SO THAT THEY CAN BE CONCATENATED TOGETHER
EASILY.

AT THE END OF THE LAST DENSE BLOCK, A GLOBAL AVERAGE POOLING IS PERFORMED AND THEN A SOFTMAX
CLASSIFIER IS ATTACHED.

Dense Block 1 Dense Block 2 Dense Block 3


Convolution

Convolution

Convolution
Pooling

Pooling

Pooling

Linea
Output

r
Pooling reduces Feature map sizes match
feature map sizes within each block
DENSENETS-B
DenseNets-B are just regular DenseNets that take advantage of 1x1 convolution to reduce the feature
maps size before the 3x3 convolution and improve computing efficiency. The B comes after the name
Bottleneck layer you are already familiar with from the work on ResNets.
DenseNet-BC (Further Compression):
 If a dense block contains m feature-maps, The transition layer generate θm output feature
maps, where 0<θ≤1 is referred to as the compression factor.
 When θ=1, the number of feature-maps across transition layers remains unchanged. DenseNet with
θ<1 is referred as DenseNet-C, and θ=0.5 in the experiment.
 When both the bottleneck and transition layers with  θ<1 are used, the model is referred
as DenseNet-BC.
 Finally, DenseNets with/without B/C and with different L layers and k growth rate are trained.

 DenseNets-C are another little incremental step to DenseNets-B, for the


cases where we would like to reduce the number of output feature maps.
The compression factor (theta) determines this reduction. Instead of having
m feature maps at a certain layer, we will have theta*m. Of course, is in the
range [0–1]. So DenseNets will remain the same when theta=1, and will be
DenseNets-B otherwise.
ADVANTAGES OF
DENSENET
ADVANTAGE 1: STRONG GRADIENT
FLOW

Error
Signal

The error signal can be easily propagated to earlier layers more


directly. This is a kind of implicit deep supervision as earlier layers
can get direct supervision from the final classification layer.
ADVANTAGE 2: PARAMETER & COMPUTATIONAL
EFFICIENCY
For each layer, number of parameters in ResNet is directly proportional to C×C while Number of
parameters in Dense Net is directly proportional to l×k×k

ResNet connectivity: #parameters:

Input s Output
t ure
fea
at ed
r rel hl O(CxC)
Co
C C

DenseNet connectivity: k<<C


Input
ures
eat Output
ifie df
ver
s O(lxkxk)
Di k: Growth rate
lX hl
k
k
ADVANTAGE 3: MAINTAINS LOW COMPLEXITY
FEATURES
Standard Connectivity:

Classifier uses most complex (high level)


features

w4 y = w4h4(x)

x h1(x) h2(x) h3(x) h4(x) classifier


In Dense Net, classifier uses features of all complexity
levels. It tends to give more smooth decision
boundaries. It also explains why Dense Net performs
well when training data is insufficient.

Increasingly complex
features
ADVANTAGE 3: MAINTAINS LOW COMPLEXITY
FEATURES
Dense Connectivity:
w0 y = w 0x +
Classifier uses features of all complexity
levels w1 +w1h1(x)
w2 +w2h2(x)
w3 +w3h3(x)
C C C C w4
+w4h4(x)
x h1(x) h2(x) h3(x) h4(x) classifier

In DenseNet, classifier uses features of all complexity levels. It tends to give more smooth decision
boundaries. It also explains why DenseNet performs well when training data is insufficient.

Increasingly complex
features
RESULTS
RESULTS ON C I FA R -
10
ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)
DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

W i t h data augmentation Without data augmentation


12.0 12.0
11.0 11.0 11.26
10.0 10.0 10.56

9.0 9.0 Previous


8.0 8.0 SOTA
Test Error

7.3
7.0 7.0
6.0 6.41 Previous 6.0
(%)

SOTA 5.9
5.0 5.0 5.2
4.62
4.0 4.5 4.2 4.0
3.6
3.0 3.0
2.0 2.0
With data augmentation (C10+), test
error:
•Small-size ResNet-110: 6.41%
•Large-size ResNet-1001 (10.2M parameters): 4.62%
•State-of-the-art (SOTA) 4.2%
•Small-size Dense Net-BC (L=100, k=12) (Only 0.8M parameters):
4.5%
•Large-size Dense Net (L=250, k=24): 3.6%

Without data augmentation (C10),


test error:
•Small-size ResNet-110: 11.26%
•Large-size ResNet-1001 (10.2M parameters): 10.56%
•State-of-the-art (SOTA) 7.3%
•Small-size Dense Net-BC (L=100, k=12) (Only 0.8M parameters):
5.9%
•Large-size Dense Net (L=250, k=24): 4.2%
RESULTS ON C IFA R -
100
ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)
DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)

W i t h data augmentation Without data augmentation


35.0 35.0 35 .5 8
33.47 Previous
30.0 30.0 SOTA
28.2
27.22 Previous
25.0 25.0
SOTA
Test Error

24.2
22.71 22.3
20.0 20.5 20.0
(%)

19.6
17.6
15.0 15.0

10.0 10.0
DETAIL RESULTS:

SVHN is the Street View House Numbers dataset. The blue


color means the best result. Dense Net-BC cannot get a
better result than the basic Dense Net, authors argue that
SVHN is a relatively easy task, and extremely deep models
may overfit the training set.
RESULTS ON
I M A GEN ET
DenseNet ResNet DenseNet ResNet
28.0 28.0
ResNet-34 ResNet-34

26.0 26.0
DenseNet-121 DenseNet-121

Top-1 error (%)


Top-1 error (%)

ResNet-50 ResNet-50
24.0 24.0
DenseNet-169 DenseNet-169

DenseNet-201ResNet-101 DenseNet-201 ResN et-101


ResNet-152 ResNet-152
22.0 22.0
DenseNet-264
DenseNet-264
DenseNet-264(k=48) DenseNet-264(k=48)

20.0 20.0

10

23
16

29
3
20

80
40

60
0

# Parameters (M) GFLOPs


Top-1: 20.27%
Top-5: 5.17%
MULTI-SCALE (Preview
DENSENET )

Classifier 1 Classifier 2 Classifier 3 Classifier 4 …


cat: 0.2 cat: 0.4 cat: 0.6
0.2 ≱ 0.4 ≱ 0.6 > threshold
threshold threshold
MULTI-SCALE (Preview
DENSENET )

Test …
Input
Inference Speed:

~ 2.6x faster than ResNets
~ 1.3x faster than DenseNets

Classifier 1 Classifier 2 Classifier 3 Classifier 4 …

“Easy” “Hard”
examples examples
CONVOLUTIONAL
NETWORKS
LeNet AlexNet

VGG Inception

ResNet

You might also like