Dense Net
Dense Net
Presentation by :
MariaWaheed ( l1f18bscs0460)
Dense Block
DenseNet Architecture
Advantages of DenseNet
CIFAR & SVHN Small-scale Dataset Results
ImageNet Large-Scale Dataset Results
Further Analysis on Feature Reuse
STANDARD
CONNECTIVITY
Dense Block:
A Dense Block is a module used in convolutional neural networks that connects all
layers (with matching feature-map sizes) directly with each other. To preserve the feed-
forward nature, each layer obtains additional inputs from all preceding layers and passes
on its own feature-maps to all subsequent layers.
In Standard ConvNet, input image goes through multiple convolution and obtain high-level
features.
R E S NET CONNECTIV ITY
Identity mappings promote gradient
propagation.
: E lement-wise addition
In ResNet, identity mapping is proposed to promote the gradient propagation. Element-wise addition is used.
It can be viewed as algorithms with a state passed from one ResNet module to another one.
DE NSE ARCHITECTURE
DE NSE
CONNECTIVITY
C C C C
C : Channel-wise
concatenation
In DenseNet, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all
subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
DE NSE AND S LIM
C C C C
So, it have higher computational efficiency and memory efficiency. The following figure shows the
concept of concatenation during forward propagation
DenseNet Architecture:
Basic DenseNet Composition Layer:
For each composition layer, Pre-Activation Batch Norm (BN) and ReLU, then 3×3 Conv are done with output feature maps
of k channels, say for example, to transform x0, x1, x2, x3 to x4. This is the idea from Pre-Activation ResNet.
Convolution (3x3)
Batch Norm
x3x x4
x1
ReL
3
0 xx1
U
2 x 0
x 2
x
k
channels
x5 =h5([x0, …, x4])
DenseNet-B (Bottleneck Layers):
To reduce the model complexity and size, BN-ReLU-1×1 Conv is done before BN-ReLU-3×3 Conv.
Convolution (1x1)
Convolution (3x3)
x4
Batch Norm
Batch Norm
x3x
ReL
ReL
x 1x
U
2
0
lxk 4xk k
channels channels channels
Higher parameter and computational
efficiency
MULTIPLE DENSE BLOCKS WITH TRANSITION LAYERS:
1×1 CONV FOLLOWED BY 2×2 AVERAGE POOLING ARE USED AS THE TRANSITION LAYERS BETWEEN TWO
CONTIGUOUS DENSE BLOCKS.
FEATURE MAP SIZES ARE THE SAME WITHIN THE DENSE BLOCK SO THAT THEY CAN BE CONCATENATED TOGETHER
EASILY.
AT THE END OF THE LAST DENSE BLOCK, A GLOBAL AVERAGE POOLING IS PERFORMED AND THEN A SOFTMAX
CLASSIFIER IS ATTACHED.
Convolution
Convolution
Pooling
Pooling
Pooling
Linea
Output
r
Pooling reduces Feature map sizes match
feature map sizes within each block
DENSENETS-B
DenseNets-B are just regular DenseNets that take advantage of 1x1 convolution to reduce the feature
maps size before the 3x3 convolution and improve computing efficiency. The B comes after the name
Bottleneck layer you are already familiar with from the work on ResNets.
DenseNet-BC (Further Compression):
If a dense block contains m feature-maps, The transition layer generate θm output feature
maps, where 0<θ≤1 is referred to as the compression factor.
When θ=1, the number of feature-maps across transition layers remains unchanged. DenseNet with
θ<1 is referred as DenseNet-C, and θ=0.5 in the experiment.
When both the bottleneck and transition layers with θ<1 are used, the model is referred
as DenseNet-BC.
Finally, DenseNets with/without B/C and with different L layers and k growth rate are trained.
Error
Signal
Input s Output
t ure
fea
at ed
r rel hl O(CxC)
Co
C C
w4 y = w4h4(x)
Increasingly complex
features
ADVANTAGE 3: MAINTAINS LOW COMPLEXITY
FEATURES
Dense Connectivity:
w0 y = w 0x +
Classifier uses features of all complexity
levels w1 +w1h1(x)
w2 +w2h2(x)
w3 +w3h3(x)
C C C C w4
+w4h4(x)
x h1(x) h2(x) h3(x) h4(x) classifier
In DenseNet, classifier uses features of all complexity levels. It tends to give more smooth decision
boundaries. It also explains why DenseNet performs well when training data is insufficient.
Increasingly complex
features
RESULTS
RESULTS ON C I FA R -
10
ResNet (110 Layers, 1.7 M) ResNet (1001 Layers, 10.2 M)
DenseNet (100 Layers, 0.8 M) DenseNet (250 Layers, 15.3 M)
7.3
7.0 7.0
6.0 6.41 Previous 6.0
(%)
SOTA 5.9
5.0 5.0 5.2
4.62
4.0 4.5 4.2 4.0
3.6
3.0 3.0
2.0 2.0
With data augmentation (C10+), test
error:
•Small-size ResNet-110: 6.41%
•Large-size ResNet-1001 (10.2M parameters): 4.62%
•State-of-the-art (SOTA) 4.2%
•Small-size Dense Net-BC (L=100, k=12) (Only 0.8M parameters):
4.5%
•Large-size Dense Net (L=250, k=24): 3.6%
24.2
22.71 22.3
20.0 20.5 20.0
(%)
19.6
17.6
15.0 15.0
10.0 10.0
DETAIL RESULTS:
26.0 26.0
DenseNet-121 DenseNet-121
ResNet-50 ResNet-50
24.0 24.0
DenseNet-169 DenseNet-169
20.0 20.0
10
23
16
29
3
20
80
40
60
0
Test …
Input
Inference Speed:
…
~ 2.6x faster than ResNets
~ 1.3x faster than DenseNets
…
“Easy” “Hard”
examples examples
CONVOLUTIONAL
NETWORKS
LeNet AlexNet
VGG Inception
ResNet