0% found this document useful (0 votes)

25 views

Capitulo 2 big data

Chapter 2 of 'Big Data for Business' discusses statistical learning, focusing on the relationship between input variables (predictors) and output variables (responses) for improving predictions and understanding relationships. It covers the importance of model accuracy, the distinction between regression and classification, and the trade-offs between accuracy and interpretability in statistical methods. Additionally, it highlights the bias-variance trade-off and the challenges of selecting the best statistical learning method for different data sets.

Uploaded by

100473538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Capitulo 2 big data

Uploaded by

100473538

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6.

Model accuracy

Big data for Business

CHAPTER 2: STATISTICAL LEARNING

Department of Statistics
Universidad Carlos III de Madrid

Bachelor in Business Administration

Bachelor in Finance and Accounting

1 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

What Is Statistical Learning?

Suppose that we are statistical consultants hired by a client to provide
advice on how to improve sales of a particular product. The Advertising
data set consists of the Sales of that product in 200 different markets,
along with advertising budgets in: TV, Radio, and Newspaper.

The advertising budgets are input variables while sales is an output variable.
25

25
20

20
Sales

Sales

Sales
15

15
10

10
5

5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper
2 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

What Is Statistical Learning?

In general, we suppose that we observe a quantitative response Y and p
different predictors, X1 , X2 , . . . , Xp and there is some relationship
between Y and X = (X1 , X2 , . . . , Xp ), which can be written as:
Y = f (X ) +
Here f is some fixed but unknown function of X = (X1 , X2 , . . . , Xp ), and
is a random error term, which is independent of X and has mean zero.

As another example, we plot of income versus years of education for 30

individuals in the Income1 data set.
80

80
70

70
60

60
Income

Income
50

50
40

40
30

30
20

10 12 14 16 18 20 22 10 12 14 16 18 20 22

Years of Education Years of Education

3 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

What Is Statistical Learning?

Now, we plot of income versus years of education and seniority for 30
individuals in the Income2 data set.

Incom
e

y
rit
Ye

o
ni
ar
so

Se
fE
du
ca
tio
n

Here f is a two-dimensional surface that must be estimated based on the

observed data. 4 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Why estimate f ?
There are two main reasons: prediction and inference.

Prediction We can predict Y using:

Ŷ = fˆ(X )
such that one is not particularly interested in the form of fˆ if it provides
accurate predictions. The accuracy of Ŷ depends on two errors:

Reduccible error We can potentially improve the accuracy of fˆ by using

the most appropriate statistical learning technique to
estimate f .
Irreducible error No matter no matter how well we estimate f , we cannot
reduce the error introduced by .

Assuming fixed X and fˆ:

E [(Y − Ŷ )2 ] = (f (X ) − fˆ(X ))2 + Var ()
| {z } | {z }
Reducible Irreducible

5 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Why estimate f ?
Inference We are often interested in understanding the way that Y is
affected as X1 , X2 , . . . , Xp change. Our goal is not necessarily to make
predictions for Y, but we instead want to understand the relationship
between X and Y , or more specifically, to understand how Y changes as
a function of X1 , X2 , . . . , Xp .

Depending on whether our ultimate goal is prediction, inference, or a

combination of the two, different methods for estimating f may be
appropriate.

For example, linear models allow for relatively simple and interpretable
model, but may not yield as accurate predictions as some other
approaches.

In contrast, highly non-linear approaches can potentially provide quite

accurate predictions for Y , but this comes at the expense of a less
interpretable model for which inference is more challenging.

6 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

How do we estimate f ?
We will always assume that we have observed a set of n different data
points, these are called training data. There are two main types of
statistical learning methods:

Parametric methods It reduces the problem of estimating f down to one

of estimating a set of parameters. For example, we may
assume that f is linear in X :

f (X ) = β0 + β1 X1 + . . . + βp Xp

such that we only need to estimate (β0 , β1 , . . . , βp ).

Non-parametric methods do not make explicit assumptions about the
functional form of f . However, these can lead to data
overfitting the data which will not yield accurate estimates
of the response on new observations that were not part of
the original training data set.

7 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Incom
How do we estimate f ?

Incom
e

e
ity

ity
or

or
Ye Ye
ni

ni
ars ar
so
Se

Se
ofE fE
du du
ca catio
tio
n n

Left: A linear model fit by least squares to the Income2 data.

Right: A rough thin-plate spline fit to the Income2 data.

8 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Trade-Off between accuracy and interpretability

• There are advantages and disadvantages to parametric and
non-parametric methods for statistical learning.
1. If we are mainly interested in inference, then restrictive models are
much more interpretable.
2. Sometimes, however, we are only interested in prediction and the
interpretability of the predictive model is simply not of interest.
• In the second case, we might expect that it will be better to use the
most flexible model available. However, this is not in general correct
because highly flexible methods lead to overfitting.
High

Subset Selection
Lasso

Least Squares
Interpretability

Generalized Additive Models

Trees

Bagging, Boosting

Support Vector Machines

Low

Low High

Flexibility 9 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Regression vs Classification
Random variables can be classified as:

Quantitative take on numerical values

Qualitative take on values in one of K different classes or categories.

Supervised learning problems are divided in:

Regression problems have a quantitative response

Classification problems have a qualitative response

However, there are situations like logistic regression (Chapter 4) with a

qualitative (two-class, or binary) response, and therefore it is a
classification method. But since it estimates class probabilities, it can be
thought of as a regression method as well.

Some statistical methods, such as K-nearest neighbors (Chapters 2 and

4) and boosting (Chapter 8), can be used in the case of either
quantitative or qualitative responses.
10 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Assessing Model Accuracy

No one method dominates all others over all possible data sets. Selecting
the best approach can be one of the most challenging parts of performing
statistical learning in practice.

In the regression setting, the most commonly-used measure is the mean

squared error (MSE):
Pn
(yi − fˆ(xi ))2
MSE = i=1
n

The MSE is computed using the training data, and so should more
accurately be referred to as the training MSE. But in general, we do not
really care how well the method works on the training data. Rather, we
are interested in the accuracy of the predictions that we obtain for
previously unseen test data:

Ave(y0 − fˆ(x0 ))2 ,

the average squared prediction error for these test observations (x0 , y0 ).
11 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Measuring the Quality of Fit

2.5
12

2.0
10

Mean Squared Error

1.5
8
Y

1.0
6

0.5
4
2

0.0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Left: Black curve is truth. Three estimates of f : the linear regression

(orange) and two smoothing spline fits (blue and green). Right: Training
MSE (grey), test MSE (red) and minimum possible test MSE (dashed).
Squares represent the training and test MSE for the three fits.
12 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Measuring the Quality of Fit

Many statistical methods specifically estimate coefficients so as to

minimize the training set MSE. For these methods, the training set MSE
can be quite small, but the test MSE is often much larger.

As model flexibility increases, training MSE will decrease, but the test
MSE may not. When a given method yields a small training MSE but a
large test MSE, we are said to be overfitting the data.

When we overfit the training data, the test MSE will be very large
because the supposed patterns that the method found in the training
data simply don’t exist in the test data.

13 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Measuring the Quality of Fit

2.5
12

2.0
10

Mean Squared Error

1.5
8
Y

1.0
6

0.5
4
2

0.0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Here the truth is smoother, so the smoother fit and linear model do really
well.
14 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Measuring the Quality of Fit

20
20

15
Mean Squared Error
10

10
Y

5
−10

0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Here the truth is wiggly and the noise is low, so the more flexible fits do
the best.
15 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

The Bias-Variance Trade-Off

The expected test MSE, for a given value x0 , can always be decomposed
into the sum of three fundamental quantities:
2
E y0 − fˆ(x0 ) = Var (fˆ(x0 )) + [Bias(fˆ(x0 ))]2 + Var ()

Variance refers to the amount by which fˆ would change if we estimated

it using a different training data set.

Bias refers to the error that is introduced by approximating a real-life

problem, which may be extremely complicated, by a much simpler model.

As a general rule, as we use more flexible methods, the variance will

increase and the bias will decrease.

As we increase the flexibility of a class of methods, the bias tends to

initially decrease faster than the variance increases. However, at some
point increasing flexibility starts to significantly increase the variance.
16 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

The Bias-Variance Trade-Off for the three examples

2.5

20
MSE
Bias
Var
2.0

2.0

15
1.5

1.5

10
1.0

1.0

5
0.5

0.5
0.0

0.0

0
2 5 10 20 2 5 10 20 2 5 10 20

Flexibility Flexibility Flexibility

In a real-life situation in which f is unobserved, it is not possible to

explicitly compute the test MSE, bias, or variance for a statistical
learning method. Nevertheless, one should always keep the bias-variance
trade-off in mind.
17 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

The Classification Setting

Suppose that y1 , . . . , yn are qualitative. Then, the training MSE is

referred as the training error rate which provides the proportion of
mistakes in the classification:
1X
I (yi 6= yˆi )
n

As in the regression setting, we are most interested in the error rates that
result from test observations that were not used in training.

The test error rate is minimized, on average, by the Bayes Classifier that
assigns each observation to the most likely class, given its predictor
values. That is, we should simply assign a test observation with predictor
vector x0 to the class j with largest Pr(Y = j|X = x0 ).

18 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

The Bayes classifier

oo o
oo o
o
o
o oo oo o
o
o oo oo ooo
o o oo ooo o
oo o o o o ooo oo oo
o o oo oo
o o o o oo o
oo oo o o o o
o o o oo o o o
o oo o o o o o
o o
o o oooo o ooo o o o o ooo
o
o
X2

o o o oo o o o ooooo o o o
oo o oo o o
o o o o oo o
o o o o
o o o oo o ooo o o
o oo o
oo o ooooo oooo
o o oo oo o o
o o oo oo o
o o o oo oo
oo
o o o o
o oo o
o o o

The purple dushed line is called the Bayes decision boundary where
Pr(Y = orange|X ) = Pr(Y = blue|X ) = 0.5.
19 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

The Bayes classifier

The Bayes classifier produces the lowest possible test error rate, called
the Bayes error rate.

The Bayes classifier will choose the class for which Pr(Y = j|X = x0 ) is
largest, then the error rate at X = x0 will be 1 − maxj Pr(Y = j|X = x0 ).

In general, the overall Bayes error rate is given by:

1 − E [max Pr(Y = j|X )]

where the expectation averages the probability over all possible values of
X.

The Bayes error rate corresponds to the irreducible error in the

classification setting.

20 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

K-Nearest Neighbors (KNN)

Given a positive integer K and a test observation x0 , the KNN classifier
first identifies the neighbors K points in the training data that are closest
to x0 , represented by N0 and estimate:
1 X
P̂r(Y = j|X = x0 ) = I (yi = j)
K
i∈N0

Finally, KNN applies Bayes rule and classifies the test observation x0 to
the class with the largest probability.

o o
o o o o
o o o o
o o

o o

o o o o
o o
o o
o o

21 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

K-Nearest Neighbors (KNN)

KNN: K=10

oo o
oo o
o
o
o oo oo o
o oo o
o o o
o oo o oo oo
oo o o o o o oo o oo
o o oo oo o oo
o o o o oo o
oo oo o o o o
o o o oo o o o
o oo o o o o o
o o
o o oooo o ooo o o o o ooo
o
oo
X2

o o o o
oo o ooo o
o o
oo o oo o oo o
o o o o oo o
o o o o
o
o o o oo o oo o o
o o o o
oo o ooooo oo
o oo
o oo o o
o o oo oo o o o
o o o oo ooo
ooo
o o
o oo o
o o o

KNN decision boundary using K = 10. The Bayes decision boundary is

shown as a purple dashed line. 22 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

K-Nearest Neighbors (KNN)

The choice of K has a drastic effect on the KNN classifier obtained.

KNN: K=1 KNN: K=100

o o o o
oo o o oo o o
o o o o
o oo o o oo o
oo oo
o o o o
o oo oo o o oo o oo oo o o oo
o o o
o o o o o
o o
oo o o o o o oo oo oo o o o o o oo oo
o o oo oo o oo o o oo oo o oo
o o o o o oo o o o o o o o oo o o
oo o o o o oo o o o o
o o o oo o o o o o o oo o o o
o oo o o o o o oo o o o o
o o ooo o oo o oo o o o o o o o ooo o oo o oo o o o o o
oo o o oo o o
o o o o oo o o oo o o o o o oo o o oo o
o o o o ooo o o o o ooo
oo o o o o oo o o o o
oo o o o oo o oo o o o oo o
o oo o o o oo o o
o o
o o o o o oo o o o o o o oo o
o ooo o o ooo o
oo o oooo oo oo o oooo oo
o o oo o oo
o oo o o o o oo o o
o o oo oo o o o o o oo oo o o o
oo o oo o
o
o
o oooo oo o
o
o
o oooo oo o
oo o oo o
o o
o o o o o o

23 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

K-Nearest Neighbors (KNN)

0.20
0.15
Error Rate

0.10
0.05

Training Errors
0.00

Test Errors

0.01 0.02 0.05 0.10 0.20 0.50 1.00

1/K

The training error rate consistently declines as the flexibility increases.

However, the test error exhibits a characteristic U-shape, declining at first
(with a minimum at approximately K = 10) before increasing again when
the method becomes excessively flexible and overfits.
24 / 25
1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6. Model accuracy

Final comments

In both the regression and classification settings, choosing the correct

level of flexibility is critical to the success of any statistical learning
method.

The bias-variance tradeoff, and the resulting U-shape in the test error,
can make this a difficult task.

In Chapter 5, we return to this topic and discuss various methods for

estimating test error rates and thereby choosing the optimal level of
flexibility for a given statistical learning method.

25 / 25

Climate Centre Report
No ratings yet
Climate Centre Report
4 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Purposive Communication Students' Learning Guide
No ratings yet
Purposive Communication Students' Learning Guide
16 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
lec1
No ratings yet
lec1
54 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Islp 5
No ratings yet
Islp 5
5 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
ISLR
No ratings yet
ISLR
9 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
00_Introduction
No ratings yet
00_Introduction
29 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Islp 4
No ratings yet
Islp 4
5 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
ML_Valkenborg
No ratings yet
ML_Valkenborg
84 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Lecture 3 - Removed
No ratings yet
Lecture 3 - Removed
11 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
2
No ratings yet
2
62 pages
Intro To Data Science Lecture 5
No ratings yet
Intro To Data Science Lecture 5
7 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
ASSESSING MODEL Accuracy PDF
No ratings yet
ASSESSING MODEL Accuracy PDF
22 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Concepts - Model Evaluation (Data Mining Fundamentals)
No ratings yet
Concepts - Model Evaluation (Data Mining Fundamentals)
40 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
02 Chap02 AssesingModelAccuracy
No ratings yet
02 Chap02 AssesingModelAccuracy
22 pages
10 Statistical Techniques
No ratings yet
10 Statistical Techniques
9 pages
Module 2
No ratings yet
Module 2
84 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Notes-1
No ratings yet
Notes-1
3 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Econometrics II Week 3 Summary
No ratings yet
Econometrics II Week 3 Summary
8 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
SRM formula sheet
No ratings yet
SRM formula sheet
16 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Alliances, Strategic Partnerships and the Power of Analytics: Gain Control, Reduce Risk and Accelerate Growth
From Everand
Alliances, Strategic Partnerships and the Power of Analytics: Gain Control, Reduce Risk and Accelerate Growth
E Keith Gaylord
No ratings yet
The 80/20 Principle (Review and Analysis of Koch's Book)
From Everand
The 80/20 Principle (Review and Analysis of Koch's Book)
BusinessNews Publishing
4/5 (5)
Abstract.: Analysis and Characterization of Alum Synthesised From Waste Can
No ratings yet
Abstract.: Analysis and Characterization of Alum Synthesised From Waste Can
6 pages
8.2 Exercise
No ratings yet
8.2 Exercise
3 pages
Generic Drugs Assessment and Approval Process in India: Dr. H. G. Koshia
No ratings yet
Generic Drugs Assessment and Approval Process in India: Dr. H. G. Koshia
41 pages
1160 5102 1 PB
No ratings yet
1160 5102 1 PB
10 pages
CHEMISTRY PAPER 2 (1)
No ratings yet
CHEMISTRY PAPER 2 (1)
13 pages
1 PDF
No ratings yet
1 PDF
4 pages
HPLC Free e Book
67% (3)
HPLC Free e Book
21 pages
Together Notes: 10 Important Relational Needs: Acceptance Affection
No ratings yet
Together Notes: 10 Important Relational Needs: Acceptance Affection
4 pages
Microsoft Word - 2021 JUNE PM EDC
No ratings yet
Microsoft Word - 2021 JUNE PM EDC
130 pages
Arch 577 - Fall 2021
No ratings yet
Arch 577 - Fall 2021
6 pages
Class VI(Chapter 3 -Formulas and Function in Excel 2016)
No ratings yet
Class VI(Chapter 3 -Formulas and Function in Excel 2016)
2 pages
Cat Honeywell Industrial
No ratings yet
Cat Honeywell Industrial
36 pages
Final SR Neet Star Super Chaina (Cbse & State) (Model-B) Neet Cdf-10 Question Paper Ex - dt-08!11!2024
No ratings yet
Final SR Neet Star Super Chaina (Cbse & State) (Model-B) Neet Cdf-10 Question Paper Ex - dt-08!11!2024
7 pages
Technical Interview Questions For Track Engineer
No ratings yet
Technical Interview Questions For Track Engineer
10 pages
[FREE PDF sample] SPT-based probabilistic and deterministic assessment of seismic soil liquefaction triggering hazard K. Onder Cetin & Raymond B. Seed & Robert E. Kayen & Robb E.S. Moss & H. Tolga Bilge & Makbule Ilgac & Khaled Chowdhury ebooks
100% (2)
[FREE PDF sample] SPT-based probabilistic and deterministic assessment of seismic soil liquefaction triggering hazard K. Onder Cetin & Raymond B. Seed & Robert E. Kayen & Robb E.S. Moss & H. Tolga Bilge & Makbule Ilgac & Khaled Chowdhury ebooks
44 pages
DaRtEvo C504E012
No ratings yet
DaRtEvo C504E012
6 pages
HD406VRH ON&OFF Specification
No ratings yet
HD406VRH ON&OFF Specification
4 pages
Inventing The Market. Smith, Hegel and Theory
No ratings yet
Inventing The Market. Smith, Hegel and Theory
254 pages
CONTROL 03 2018 30XAS XA XA-ZE XB XBP XW XW-ZE tcm478-51332
100% (1)
CONTROL 03 2018 30XAS XA XA-ZE XB XBP XW XW-ZE tcm478-51332
44 pages
T1 - Introduction of Surveying PDF
No ratings yet
T1 - Introduction of Surveying PDF
34 pages
Dse-0055.6 Smartarc en
No ratings yet
Dse-0055.6 Smartarc en
4 pages
CSTP 1-6 Anderson 0925
No ratings yet
CSTP 1-6 Anderson 0925
46 pages
Greening Cities Shaping Cities
No ratings yet
Greening Cities Shaping Cities
378 pages
100 Ways To Improve Your Lubrication Program
100% (1)
100 Ways To Improve Your Lubrication Program
29 pages
RadiaScan 501 701 User Manual
No ratings yet
RadiaScan 501 701 User Manual
30 pages
Design of Column As Per IS-456: Steel
No ratings yet
Design of Column As Per IS-456: Steel
1 page
AP U.S Government and Politics: Scoring Rubric
No ratings yet
AP U.S Government and Politics: Scoring Rubric
3 pages
Assignment 04
No ratings yet
Assignment 04
2 pages

Uploaded by

Uploaded by

1. What 2. Why 3. How 4. Accuracy and interpretability 5. Regression vs Classification 6.

Big data for Business

CHAPTER 2: STATISTICAL LEARNING

Bachelor in Business Administration

What Is Statistical Learning?

What Is Statistical Learning?

As another example, we plot of income versus years of education for 30

Years of Education Years of Education

What Is Statistical Learning?

Here f is a two-dimensional surface that must be estimated based on the

Prediction We can predict Y using:

Reduccible error We can potentially improve the accuracy of fˆ by using

Assuming fixed X and fˆ:

Depending on whether our ultimate goal is prediction, inference, or a

In contrast, highly non-linear approaches can potentially provide quite

Parametric methods It reduces the problem of estimating f down to one

such that we only need to estimate (β0 , β1 , . . . , βp ).

Left: A linear model fit by least squares to the Income2 data.

Right: A rough thin-plate spline fit to the Income2 data.

Trade-Off between accuracy and interpretability

Generalized Additive Models

Support Vector Machines

Quantitative take on numerical values

Supervised learning problems are divided in:

Regression problems have a quantitative response

However, there are situations like logistic regression (Chapter 4) with a

Some statistical methods, such as K-nearest neighbors (Chapters 2 and

Assessing Model Accuracy

In the regression setting, the most commonly-used measure is the mean

Ave(y0 − fˆ(x0 ))2 ,

Measuring the Quality of Fit

Mean Squared Error

Left: Black curve is truth. Three estimates of f : the linear regression

Measuring the Quality of Fit

Many statistical methods specifically estimate coefficients so as to

Measuring the Quality of Fit

Mean Squared Error

Measuring the Quality of Fit

The Bias-Variance Trade-Off

Variance refers to the amount by which fˆ would change if we estimated

Bias refers to the error that is introduced by approximating a real-life

As a general rule, as we use more flexible methods, the variance will

As we increase the flexibility of a class of methods, the bias tends to

The Bias-Variance Trade-Off for the three examples

Flexibility Flexibility Flexibility

In a real-life situation in which f is unobserved, it is not possible to

The Classification Setting

Suppose that y1 , . . . , yn are qualitative. Then, the training MSE is

The Bayes classifier

The Bayes classifier

In general, the overall Bayes error rate is given by:

1 − E [max Pr(Y = j|X )]

The Bayes error rate corresponds to the irreducible error in the

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN)

KNN decision boundary using K = 10. The Bayes decision boundary is

K-Nearest Neighbors (KNN)

KNN: K=1 KNN: K=100

K-Nearest Neighbors (KNN)

0.01 0.02 0.05 0.10 0.20 0.50 1.00

The training error rate consistently declines as the flexibility increases.

In both the regression and classification settings, choosing the correct

In Chapter 5, we return to this topic and discuss various methods for

You might also like