11-Nonlinear Models (Neural Networks)
11-Nonlinear Models (Neural Networks)
Linear models
In previous topics, we mainly dealt with linear models
• Regression ℎ(𝒙) = 𝒘! 𝒙 + 𝑏
• Classification:
ℎ(𝒙) = argmax 𝒘!
" 𝒙 + 𝑏"
Nonlinear models
• Non-linear features 𝝓(𝒙)
○ E.g., Gaussian discriminant analysis with the different covariance
matrices, in which case we have quadratic features of 𝒙.
• Non-linear kernel 𝑘6𝒙" , 𝒙# 8
○ A kernel is an inner-product of two data samples that are
transformed in a certain vector space. The vector space could be very
high-dimensional (e.g., with infinite dimensions). A linear
classification in such a high-dimensional space could be non-linear in
the original low dimensional space.
• Learnable non-linear mapping
○ We can probably stack a few layers of learnable non-linear functions
(e.g., logistic functions) to learn the non-linear feature 𝝓(𝒙) or a non-
linear kernel that is appropriate to the task at hand.
𝑧 = 𝒘! 𝒙 + 𝑏
𝑦 = 𝑓(𝑧)
To simply notations, we omit the layer L, but call the output of the current
layer as 𝑦 and the input of the current layer 𝑥, which is the output of the
lower layer. In the simplified notation,
○ Recursion
○ Termination
Recursion on what?
Backpropagation (BP)
○ Initialization
○ Recursion
Backpropagation (BP)
○ Initialization
○ Recursion
○ Termination
Auto-differentiation in general