0% found this document useful (0 votes)

16 views51 pages

Ch2_Statistical_Learning

The document introduces statistical learning, focusing on estimating the function f that relates predictors to a response variable, Y. It discusses the importance of prediction and inference, the methods for estimating f (parametric and non-parametric), and the trade-offs between model complexity and interpretability. Additionally, it covers supervised vs. unsupervised learning, regression vs. classification, and the bias-variance trade-off in model selection.

Uploaded by

Sohaib Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views51 pages

Ch2_Statistical_Learning

Uploaded by

Sohaib Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

1

STATISTICAL LEARNING?
Chapter 02

Some slides/pictures taken from:

https://www.linkedin.com/pulse/accuracy-bias-variance-tradeoff-yair-rajwan-ms-dsc
2

Outline
➢What Is Statistical Learning?
➢ Why estimate f?
➢ How do we estimate f?
➢ The trade-off between prediction accuracy and model
interpretability
➢ Supervised vs. unsupervised learning
➢ Regression vs. classification problems
➢ Assessing quality of fit
➢ Bias variance trade off
➢ Nearest neighbor for classification
What is Statistical Learning?

25
20

20
Sales

Sales

Sales
15

15
10

10
5

5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100
TV Radio Newspaper

Suppose given the budgets on three media, our aim is to build a model
to predict sales accurately.
Shown are scatter plots of Sales vs TV, Radio and Newspaper, with a
blue linear-regression line fit separately to each.
Can we predict Sales using these three variables?
Perhaps we can do better using a model:
Sales≈ f (TV, Radio, Newspaper)
3/30
4

What is Statistical Learning?

➢Suppose we observe Yi and Xi = (Xi1,..., Xip ) for i =1,..., n
➢We believe that there is a relationship between Y and at
least one of the X’s.
• Here Y = Sales is a response or target (output) that we wish to
predict..
• TV is a feature, or input, or predictor; we name it X 1 .
Likewise, name Radio as X 2 , and so on
➢We can generally model the relationship as:

Yi = f (Xi ) +  i
➢Where f is an unknown function and ε is a random error
with mean zero.
5

A Simple Example
0.10
0.05
0.00
y

-0.05
-0.10

0.0 0.2 0.4 0.6 0.8 1.0

x
6

A Simple Example
0.10
0.05

εi
0.00
y

-0.05

f
-0.10

0.0 0.2 0.4 0.6 0.8 1.0

x
7

Simulated Data: Income vs. Education and

Seniority
Red points are simulated values for income from the model
income = f (education, seniority) + ε
f is the blue surface.
8

Why Do We Estimate f?
➢Statistical Learning, and this course, are all about how to
estimate f.
➢The term statistical learning refers to using the data to
“learn” f.
➢Why do we care about estimating f?
➢There are 2 reasons for estimating f:
➢ Prediction
➢ Inference
Inference may involve testing relevant subject mater theory
(e.g., economic theory) and suggesting policy based on this
inference.
9

1. Prediction
➢If we can produce a good estimate for f (and the variance
of ε is not too large) we can make accurate predictions for
the response, Y, based on a new (unseen) value of X.
10

Example: Direct Mailing Prediction

➢Interested in predicting how much money an individual
will donate based on observations from 90,000 people on
which we have recorded over 400 different characteristics
(x variables).
➢Don’t care too much about each individual characteristic.
➢Just want to know: For a given individual should I send
out a mailing?
11

2. Inference
➢Alternatively, we may also be interested in the type of
relationship between Y and the X's.
➢For example,
➢ Which particular predictor actually affects the response?
➢ Is the relationship positive or negative?
➢ Is the relationship a simple linear one or is it more complicated
etc.?
12

Example: Housing Price Inference

➢Wish to predict median house price based on 14
variables.
➢Probably want to understand which factors have the
biggest effect on the response and how big the effect is.
➢For example, how much impact does a river view have on
the house value etc.
13

How Do We Estimate f ?
➢We will assume we have observed a set of training data

{( X1, Y1 ), (X2 , Y2 ),, (Xn , Yn )}

➢We must then use the training data and a statistical
method to estimate f.
➢Statistical Learning Methods:
➢ Parametric Methods
➢ Non-parametric Methods
14

Parametric Methods
➢It reduces the problem of estimating f down to one of
estimating a set of parameters.

➢They involve a two-step model-based approach

STEP 1:
Make some assumption about the functional form of f, i.e. come up
with a model. The most common example is a linear model i.e.

f (Xi ) = 0 + 1 X i1 +  2 X i 2 +  +  p X ip
However, in this course we will examine far more complicated,
and flexible, models for f. In a sense the more flexible the model
the more realistic it is.
15

Parametric Methods (cont.)

STEP 2:

Use the training data to fit the model i.e. estimate f

or equivalently the unknown parameters such as
β0, β1, β2,…, βp.

➢The most common approach for estimating

the parameters in a linear model is ordinary
least squares (OLS).
➢However, this is only one way.
➢We will see in this course that there are often
alternative and superior approaches.
Example of Parametric Method:

A linear model fˆL(X) = βˆ0 + βˆ1 X gives a reasonable fithere

3
⚫ ⚫

2
⚫
⚫
⚫ ⚫
⚫
⚫ ⚫ ⚫

1
⚫ ⚫⚫⚫
⚫
⚫ ⚫
⚫

y
⚫ ⚫ ⚫ ⚫⚫
⚫ ⚫ ⚫ ⚫ ⚫ ⚫⚫ ⚫⚫ ⚫

0
⚫ ⚫ ⚫ ⚫ ⚫
⚫ ⚫ ⚫ ⚫
⚫
⚫ ⚫ ⚫⚫ ⚫ ⚫ ⚫ ⚫
⚫
⚫ ⚫ ⚫

−1
⚫ ⚫⚫ ⚫ ⚫
⚫

−2

1 2 3 4 5 6

A quadratic model fˆQ(X) = βˆ0 + βˆ1X + βˆ2X2 fits slightly better.

⚫ ⚫

⚫
2

⚫
⚫
⚫ ⚫ ⚫
⚫ ⚫ ⚫
⚫ ⚫⚫⚫
1

⚫ ⚫ ⚫
⚫
y

⚫ ⚫ ⚫ ⚫⚫
⚫ ⚫
⚫ ⚫ ⚫ ⚫ ⚫
⚫ ⚫ ⚫
0

⚫ ⚫ ⚫ ⚫ ⚫
⚫ ⚫ ⚫
⚫ ⚫
⚫ ⚫ ⚫⚫ ⚫ ⚫ ⚫ ⚫
⚫
⚫ ⚫ ⚫
−1

⚫ ⚫⚫ ⚫ ⚫
⚫
−2

1 2 3 4 5 6

x
17

Example: A Linear Regression Estimate

• Even if the
error standard
deviation is
low, we will still
get a bad
estimate of f if
we use wrong
model e.g., fit
of a linear
model (a
plane) to the
income data.
f = b0 + b1 ´ Education + b2 ´ Seniority
Parametric models assume a global structure on (x,y) relationship
18

Non-parametric Methods
➢They do not make explicit assumptions about the
functional form of f.
➢Advantages: They accurately fit a wider range of possible
shapes of f.
➢Disadvantages: A very large number of observations is
required to obtain an accurate estimate of f
Example of Non-parametric Method: K Nearest Neighbor Method

• Let we want to predict Y at X = 4 .

• We can use average of points which are near to x
fˆ(x) = Average (Y |X ∈ N (x))
where N (x) is some neighborhood of x.
• The resulting method is called nearest neighbor method.

6/30
20

Another example of non-parametric

method: a thin-plate spline estimate
• Splines are
special type of
piece-wise
polynomials.
Non-linear
regression
methods are
more flexible
and can
potentially
provide more
accurate
predictions.

Non-parametric models assume a glocal structure on (x,y) relationship

Tradeoff Between Prediction Accuracy

and Model Interpretability
➢Why not just use a more flexible method if it is more realistic?
➢There are two reasons:
Reason 1:
A simple method such as linear regression produces a model
which is much easier to interpret (the Inference part is better).
For example, in a linear model, βj is the average increase in Y
for a one unit increase in Xj holding all other variables constant.
22

Reason 2:
Even if you are only interested in prediction, so the first
reason is not relevant, it is often possible to get more
accurate predictions with a simple, instead of a
complicated, model. This seems counter intuitive but
has to do with the fact that it is harder to fit a more
flexible model.
23

Good training fit but poor test prediction

• Non-linear
regression
methods can
also be too
flexible and
produce poor
estimates for f
for test data.
Here the fitted
model makes no
errors on the
training data!
Also known as
overfitting.
24

Supervised vs. Unsupervised Learning

➢We can divide all learning problems into Supervised and
Unsupervised situations
➢Supervised Learning:
➢ Supervised Learning is where both the predictors, Xi, and the
response, Yi, are observed.
➢ Regression and classification are two main class of approaches
➢ Most of this course will also deal with supervised learning.
25

➢Unsupervised Learning:
➢ In this situation only the Xi’s are observed.
➢ A common example is market segmentation where we try to divide
potential customers into groups based on their characteristics.
➢ A common approach is clustering.
26

A Simple Clustering Example

A clustering data set involving three groups. Each group is shown using a different colored symbol.
Left: The three groups are well-separated. In this setting, a clustering approach should successfully
identify the three groups. Right: There is some overlap among the groups. Now the clustering task is
more challenging.
27

Regression vs. Classification

➢Supervised learning problems can be further divided into
regression and classification problems.
➢Regression covers situations where Y is
quantitative/continuous/numerical. e.g.,
➢ Predicting the value of the KSE100 Index in 6 months.
➢ Predicting the value of a given house based on various features.
➢Classification covers situations where Y is categorical e.g.,
➢ Will the KSE 100 Index be up (U) or down (D) in 6 months?
➢ Is this email a SPAM or not?
28

Different Approaches
➢We will deal with both types of problems in this course.
➢Some methods work well on both types of problem e.g.,
Neural Networks
➢Other methods work best on Regression, e.g., Linear
Regression, or on Classification, e.g., k-Nearest
Neighbors.
• Note: There are other learning methods including online
learning, reinforcement learning etc.
IOM 530: Intro. to Statistical Learning 29

Flexibility
Measuring Quality of Fit (in Regression
context)
➢Suppose we have a regression problem.
➢One common measure of accuracy is the mean squared
error (MSE) for regression i.e.
1 n ˆ
MSETR =  [ yi − f ( xi )]2

n i =1
➢Where fˆ ( xi ) is the prediction our method gives for the
observation in our training data.
➢MSE for test data ( MSE ) is similarly defined.
TE
A Problem
➢In either case our method has generally been designed to
make MSE small on the training data we are looking at
e.g., with linear regression we choose the line such that
MSE is minimized.

➢What we really care about is how well the method works

on new data. We call this new data “Test Data”.

➢There is no guarantee that the method with the smallest

training MSE will have the smallest test (i.e. new data)
MSE.
Training vs. Test MSE’s
➢In general, the more flexible a method is the lower its
training MSE will be i.e. it will “fit” or explain the training
data very well.

➢However, the test MSE may in fact be higher for a more

flexible method than for a simple approach
Examples with Different Levels of Flexibility: Example 1

LEFT RIGHT
Black: Truth RED: Test MES
Orange: Linear Estimate Grey: Training MSE
Blue: smoothing spline Dashed: Minimum possible test
Green: smoothing spline (more MSE (irreducible error)
flexible)
Bias Variance Tradeoff
➢The previous graphs of test versus training MSE’s
illustrates a very important tradeoff that governs the
choice of statistical learning methods.

➢There are always two competing forces that govern the

choice of learning method i.e., bias and variance.
Bias of Learning Methods
➢Bias refers to the error that is introduced by modeling a
real life problem (that may be quite complicated) by a
much simpler problem.
➢Bias = E[fˆ(x)] − f (x)

➢For example, linear regression assumes that there is a

linear relationship between Y and X. It is unlikely that, in
real life, the relationship is exactly linear so some bias will
be present. e.g., in the previous plots 1 and 3.

➢The more flexible/complex a method is the less bias it will

generally have.
Variance of Learning Methods
➢Variance refers to how much your estimate for f would
change by if you had a different training data set.

Var[ fˆ ( x)] = E[ fˆ ( x) − E[ fˆ ( x)]]2

➢Generally, the more flexible a method is the more
variance it has.
IOM 530: Intro. to Statistical Learning 39
The Trade-off
➢It can be shown that for any given, X=x0, the expected
test Mean Square Error (MSE) for a new 𝑦0 at x0 will be
equal to

➢Typically, as the flexibility of fˆ increases, its variance

increases, and its bias decreases. So, choosing the
flexibility based on average test error amounts to a bias-
variance trade-of. The expected test MSE may go up or
down!
Test MSE, Bias and Variance
The Classification Setting
➢For a regression problem, we used the MSE to assess
the accuracy of the statistical learning method
➢For a classification problem we can use the error rate i.e.
n
Error Rate =  I ( yi  yˆ i ) / n
i =1

➢ I ( yi  yˆ i ) is an indicator function, which will give 1 if the

condition ( yi  yˆ i ) is correct, otherwise it gives a 0.

➢Thus, the error rate represents the fraction of incorrect

classifications, or misclassifications
Bayes Error Rate
➢The Bayes error rate refers to the lowest possible error rate
that could be achieved if somehow, we knew exactly what the
“true” probability distribution of the data.
➢Bayes classifier assigns each observation to the most likely
class, given its predictor values i.e. we simply assign a test
observation with predictor vector x0 to the class j for which
Pr(Y = j|X = x0) is largest.
➢On test data, no classifier (or learning method) can get lower
error rates than the Bayes error rate.

➢For two-class problem Bayes classifier corresponds to

predicting class one if Pr(Y = 1|X = x0) > 0.5, and class two
otherwise. Of course, in real life problems the Bayes error rate
can’t be calculated exactly.
Bayes Optimal Classifier (predict orange
or blue class at each x1, x2 pair)
K-Nearest Neighbors (KNN)
➢K-Nearest Neighbors is a flexible approach to estimate
the Bayes Classifier.
➢For any given X we find the k closest neighbors (e.g.,
using the Euclidian distance) to X in the training data, and
examine their corresponding Y.
➢If the majority of the Y’s are orange, we predict orange
otherwise guess blue.
➢The smaller the k is the more flexible the method will be
(the idea is that average of a small number of values may
be quite erratic i.e. follows the error but average of large
number of values gets smoother as error gets averaged
out).
KNN classifies the test observation x0 to the class with the
largest probability i.e.
KNN Example with k = 3

For observation X, predicted class is blue, why?

Simulated Data: K = 10
K = 1 and K = 100
Training vs. Test Error Rates on the Simulated
Data
➢Notice that training
error rates keep
going down as k
decreases or
equivalently as the
flexibility (1/K)
increases.

➢However, the test

error rate at first
decreases but then
starts to increase
again.
IOM 530: Intro. to Statistical Learning 50
A Fundamental Picture
➢In general training
errors will always
decline.
➢However, test errors
will decline at first
(as reductions in
bias dominate) but
will then start to
increase again (as
increases in
We must always keep this picture in mind when
variance dominate). choosing a learning method. More flexible/complicated
is not always better!
Ex Ch. 2

Kernel Smoothing-MP Wand-MC Jones-1995
100% (1)
Kernel Smoothing-MP Wand-MC Jones-1995
228 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
An Introduction To Football Modelling at Smartodds by Robert Johnson
No ratings yet
An Introduction To Football Modelling at Smartodds by Robert Johnson
104 pages
Supervised Machine Learning Algorithm
100% (1)
Supervised Machine Learning Algorithm
111 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
zhang1994
No ratings yet
zhang1994
22 pages
An Introduction to Statistical Learning PDF
No ratings yet
An Introduction to Statistical Learning PDF
35 pages
Unit 3 - Mathematics III - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Mathematics III - WWW - Rgpvnotes.in
35 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
17 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Imran Bashir - Blockchain Consensus - An Introduction To Classical, Blockchain, and Quantum Consensus Protocols-Apress (2022)
100% (3)
Imran Bashir - Blockchain Consensus - An Introduction To Classical, Blockchain, and Quantum Consensus Protocols-Apress (2022)
457 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
ASSESSING MODEL Accuracy PDF
No ratings yet
ASSESSING MODEL Accuracy PDF
22 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
ML_Valkenborg
No ratings yet
ML_Valkenborg
84 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Module 2
No ratings yet
Module 2
84 pages
Patient Flow Control in Emergency Departments Using Simulation Modeling and The Random Forest Algorithm
No ratings yet
Patient Flow Control in Emergency Departments Using Simulation Modeling and The Random Forest Algorithm
9 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
02 Chap02 AssesingModelAccuracy
No ratings yet
02 Chap02 AssesingModelAccuracy
22 pages
2009 CDC Nagy
No ratings yet
2009 CDC Nagy
7 pages
Motor Imagery Classification
No ratings yet
Motor Imagery Classification
18 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
System of Different Constraints
No ratings yet
System of Different Constraints
14 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Cryptography and Network Security: Third Edition by William Stallings
No ratings yet
Cryptography and Network Security: Third Edition by William Stallings
22 pages
ISLR
No ratings yet
ISLR
9 pages
Finite Difference
No ratings yet
Finite Difference
27 pages
Factoring Polynomials
100% (1)
Factoring Polynomials
18 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Quantum Kuramoto Model
No ratings yet
Quantum Kuramoto Model
14 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
CNS Module 1 - Converted
No ratings yet
CNS Module 1 - Converted
17 pages
00_Introduction
No ratings yet
00_Introduction
29 pages
2-Classification of Signals and Systems-05!01!2024
No ratings yet
2-Classification of Signals and Systems-05!01!2024
67 pages
Solution For Assignment 1 Problem 1 With 20 Grid Points
No ratings yet
Solution For Assignment 1 Problem 1 With 20 Grid Points
7 pages
Islp 1
No ratings yet
Islp 1
15 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Control of A Car-Like Robot Using A Virtual Vehicl
No ratings yet
Control of A Car-Like Robot Using A Virtual Vehicl
7 pages
1 Introduction
No ratings yet
1 Introduction
8 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
Advanced Statistics Day 1
No ratings yet
Advanced Statistics Day 1
61 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
2
No ratings yet
2
62 pages
Islp 5
No ratings yet
Islp 5
5 pages
Islp 4
No ratings yet
Islp 4
5 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
ISL Answers
No ratings yet
ISL Answers
19 pages
Model paper 3 (XII 2024-25) Math
No ratings yet
Model paper 3 (XII 2024-25) Math
6 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
Lecture 3 - Removed
No ratings yet
Lecture 3 - Removed
11 pages
AS P1 2020 Assignment 1
No ratings yet
AS P1 2020 Assignment 1
8 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
LADE7 Intro To Linear Equations
No ratings yet
LADE7 Intro To Linear Equations
18 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
FINMAR - Time Value of Money
No ratings yet
FINMAR - Time Value of Money
2 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
Yapı Analizi
No ratings yet
Yapı Analizi
5 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
lec1
No ratings yet
lec1
54 pages
Notes-1
No ratings yet
Notes-1
3 pages
Simulation of A Single-Server Queueing System: 1.4.1 Problem Statement
No ratings yet
Simulation of A Single-Server Queueing System: 1.4.1 Problem Statement
11 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Tridiagonal Matrix Algorithm - Wikipedia
No ratings yet
Tridiagonal Matrix Algorithm - Wikipedia
6 pages
COMP4660-8420 Assignment2
No ratings yet
COMP4660-8420 Assignment2
3 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Regression
No ratings yet
Regression
45 pages
UNIT-3-1
No ratings yet
UNIT-3-1
41 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Statistical Regression
No ratings yet
Statistical Regression
32 pages
Logistic Regression vs Decision Tree
No ratings yet
Logistic Regression vs Decision Tree
2 pages
10 Statistical Techniques
No ratings yet
10 Statistical Techniques
9 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Machine Learning Lecture Notes Undergrad (1)
No ratings yet
Machine Learning Lecture Notes Undergrad (1)
19 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Artificial Neural Networks (ch7)
No ratings yet
Artificial Neural Networks (ch7)
12 pages
Examples With Assignment On Finite Difference
No ratings yet
Examples With Assignment On Finite Difference
15 pages
DSA by Shradha Didi & Aman Bhaiya
No ratings yet
DSA by Shradha Didi & Aman Bhaiya
7 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)

Uploaded by

Uploaded by

1

Some slides/pictures taken from:

What is Statistical Learning?

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

Simulated Data: Income vs. Education and

Example: Direct Mailing Prediction

Example: Housing Price Inference

{( X1, Y1 ), (X2 , Y2 ),, (Xn , Yn )}

➢They involve a two-step model-based approach

Parametric Methods (cont.)

Use the training data to fit the model i.e. estimate f

➢The most common approach for estimating

A linear model fˆL(X) = βˆ0 + βˆ1 X gives a reasonable fithere

A quadratic model fˆQ(X) = βˆ0 + βˆ1X + βˆ2X2 fits slightly better.

Example: A Linear Regression Estimate

• Let we want to predict Y at X = 4 .

Another example of non-parametric

Non-parametric models assume a glocal structure on (x,y) relationship

Tradeoff Between Prediction Accuracy

Good training fit but poor test prediction

Supervised vs. Unsupervised Learning

A Simple Clustering Example

Regression vs. Classification

➢What we really care about is how well the method works

➢There is no guarantee that the method with the smallest

➢However, the test MSE may in fact be higher for a more

➢There are always two competing forces that govern the

➢For example, linear regression assumes that there is a

➢The more flexible/complex a method is the less bias it will

Var[ fˆ ( x)] = E[ fˆ ( x) − E[ fˆ ( x)]]2

➢Typically, as the flexibility of fˆ increases, its variance

➢ I ( yi  yˆ i ) is an indicator function, which will give 1 if the

➢Thus, the error rate represents the fraction of incorrect

➢For two-class problem Bayes classifier corresponds to

For observation X, predicted class is blue, why?

➢However, the test

You might also like