0% found this document useful (0 votes)

11 views

Forecasting and Learning Theory

forecasting

Uploaded by

lovishh03.ssll

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Forecasting and Learning Theory

forecasting

Uploaded by

lovishh03.ssll

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Forecasting and learning theory

Regression
• In regression we are interested in input-output relationships
• Regression is the prediction of a numeric value.
• In classification, we seek to identify the categorical class Ck associate with a given
input vector x.
• In regression, we seek to identify (or estimate) a continuous variable y associated
with a given input vector x.
• In regression the output is continuous
– Function Approximation
• Many models could be used – Simplest is linear regression.
• When we have a single input attribute (x) and we want to use linear regression,
this is called simple linear regression.
• y is called the dependent variable.
• x is called the independent variable
• If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called
multiple linear regression.
Regression examples
Linear regression
• Given an input x we would like to
compute an output y
• For example:
Y
- Predict height from age
- Predict Google’s price from
Yahoo’s price
- Predict distance from wall from
sensors
X
Linear regression
• Given an input x we would like to compute an
output y error
• In linear regression we assume that y and x are
related with the following equation:
Y

b0
What we are Observed values slope
trying to
predict (Independent
(dependent Y=b0+b1X+e variable)
variable)

where : X
e : error ,b0 :y intercept, b1 :slope
remember: Y is always continuous
objective function
• We will "fit" the points with a line (i.e. hyper-
plane)
• Which line should we use?
– Choose an objective function
– For simple linear regression we choose sum
squared error (SSE)
• S (predictedi – actuali)2 = S (residuei)2

– Thus, find the line which minimizes the sum of the

squared residues (e.g. least squares)
Linear regression
Y
• Our goal is to estimate w from a training data
of <xi,yi> pairs
• Optimization goal: minimize squared error
(least squares):

• Why least squares?

- minimizes squared distance between
measurements and predicted line
- has a nice probabilistic interpretation
- the math is pretty
Example
Use of linear regression
• Risk analysis
• Forecasting sales
• Business domains.
Logistic Regression
Logistic Regression
• Logistic regression is a method used to predict
a dependent variable , given set of
independent variable such that the dependent
variable is categorical.
• Dependent variable(Y) :the response binary
variable holding values like 0or 1 , yes or no
• Independent variable (X): the predictor
variable used to predict the response variable
Logistic Regression
• Our Bank Manager wants to build a prediction
model to predict if a customer will payback the loan
• A statistician advised our Bank Manager to use
Logistic regression
• Why not use linear regression?
• Least squares regression can cause impossible
estimates such as probabilities that are less than
zero and greater than 1. So, when the predicted
value is measured as a probability, use Logistic
Regression
Logistic Regression
• But what if there is an outlier in the data. Things would get
pretty messy.
• To deal with outliers, Logistic Regression uses Sigmoid function.
• Really a technique for classification, not regression
• The idea of Logistic Regression is to find a relationship between
features and probability of particular outcome.
• we use Maximum Likelihood Estimation for parameter
estimation.
• The maximum likelihood estimate is that set of regression
coefficients for which the probability of getting the data we
have observed is maximum
Logistic regression
• logistic regression is a discriminative classifier
• A discriminative model, tries to learn to
distinguish the classes (perhaps without
learning much about them)
• Logistic regression algorithm also uses a linear
equation with independent predictors to
predict a value.
• Very fast.
Equation
Log(Y/1-Y)= c+b1X1+b2X2+….
Where
C:constant term which will be the probability of
an event happening when no other factors are
considered.
Y : is the probability of an event to happen which
you are trying to predict.
X1, X2 : are independent variables which
determine the occurrence of an event Y
Sigmoid curve

Results are categorical

Use of logistic regression
• Classification Problems
• Cyber security
• Image processing
Logistic regression Vs Linear regression
• The essential difference between these two is that Logistic
regression is used when the dependent variable is binary in
nature. In contrast, Linear regression is used when the
dependent variable is continuous.
• Nature of logistic regression is curve and nature of the linear
regression is linear.
• Linear regression requires to establish the linear relationship
among dependent and independent variable whereas it is not
necessary for logistic regression.
• Estimation method in logistic regression is maximum likely
hood estimation. In linear regression estimation method is
least square estimation method.
Regression Tree
Pruning

• The most fundamental problem with decision trees is that they "overfit" the data and
hence do not provide good generalization. A solution to this problem is to prune the tree:

• But pruning the tree will always increase the error rate on the training set .
 size   i( N )
• Cost-complexity Pruning: leaf nodes . Each node in the tree can be classified in terms
of its impact on the cost-complexity if it were pruned. Nodes are successively pruned until
certain heuristics are satisfied.
• By pruning the nodes that are far too specific to the training set, it is hoped the tree will
have better generalization. In practice, we use techniques such as cross-validation and
held-out training data to better calibrate the generalization properties.
How to choose the right algorithm
• What are you trying to get out of this?
Step 1
• If you’re trying to predict or forecast a target value, then you
need to look into supervised learning.
• If not, then unsupervised learning is the place you want to be.
Step 2
• Is it a discrete value like Yes/No, 1/2/3, A/B/C, or
Red/Yellow/Black? If so, then you want to look into
classification.
• If the target value can take on a number of values, say any value
from 0.00 to 100.00, or -999 to 999, or + to -, then you need to
look into regression.
Step3
• Are you trying to fit your data into some
discrete groups? If so and that’s all you need,
you should look into clustering.
• Do you need to have some numerical estimate
of how strong the fit is into each group? If you
answer yes, then you probably should look
into a density estimation algorithm
What data do you have or can you collect?

• Are the features nominal or continuous?

• Are there missing values in the features? If
there are missing values, why are there
missing values?
• Are there outliers in the data?
Overview of Bias and Variance

• In supervised machine learning an algorithm learns

a model from training data.
• The goal of any supervised machine learning
algorithm is to best estimate the mapping function
(f) for the output variable (Y) given the input data
(X).
• The mapping function is often called the target
function because it is the function that a given
supervised machine learning algorithm aims to
approximate.
Prediction error
• Anytime you have a difference between your
model and your measurements, you have an
error.
• The prediction error for any machine learning
algorithm can be broken down into three
parts:
• Bias Error
• Variance Error
• Irreducible Error
Irreducible error
• The irreducible error cannot be reduced
regardless of what algorithm is used.
• It is the error introduced from the chosen
framing of the problem and may be caused by
factors like unknown variables that influence
the mapping of the input variables to the
output variable.
Bias Error

• Bias are the simplifying assumptions made by a model to make

the target function easier to learn.
• Generally, parametric algorithms have a high bias making them
fast to learn and easier to understand but generally less
flexible.
• In turn, they have lower predictive performance on complex
problems that fail to meet the simplifying assumptions of the
algorithms bias.
• Low Bias: Suggests less assumptions about the form of the
target function.
• High-Bias: Suggests more assumptions about the form of the
target function.
Examples of Bias
• Examples of low-bias machine learning
algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.

• Examples of high-bias machine learning

algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
Variance Error

• Variance is the amount that the estimate of the

target function will change if different training data
was used.
• The target function is estimated from the training
data by a machine learning algorithm, so we should
expect the algorithm to have some variance.
• Ideally, it should not change too much from one
training dataset to the next, meaning that the
algorithm is good at picking out the hidden
underlying mapping between the inputs and the
output variables.
• Machine learning algorithms that have a high
variance are strongly influenced by the
specifics of the training data.
• This means that the specifics of the training
have influences the number and types of
parameters used to characterize the mapping
function.
Low Variance vs High Variance
• Low Variance: Suggests small changes to the estimate
of the target function with changes to the training
dataset.
• High Variance: Suggests large changes to the estimate
of the target function with changes to the training
dataset.
• Generally, nonparametric machine learning algorithms
that have a lot of flexibility have a high variance.
• For example, decision trees have a high variance, that
is even higher if the trees are not pruned before use.
Examples
• Examples of low-variance machine learning
algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.

• Examples of high-variance machine learning

algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.
Bias-Variance Trade-Off
• The goal of any supervised machine learning
algorithm is to achieve low bias and low variance.
• In turn the algorithm should achieve good
prediction performance.
• Generally, Parametric or linear machine learning
algorithms often have a high bias but a low
variance.
• Generally, Non-parametric or non-linear machine
learning algorithms often have a low bias but a high
variance.
Figure 8.8 The bias variance tradeoff illustrated with test error
and training error. The training error is the top curve, which has
a minimum in the middle of the plot. In order to create the best
forecasts, we should adjust our model complexity where the
test error is at a minimum.
Handling Bias
• The k-nearest neighbors algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the value of k which increases the number of
neighbors that contribute t the prediction and in turn
increases the bias of the model.
• The support vector machine algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the C parameter that influences the number
of violations of the margin allowed in the training data
which increases the bias but decreases the variance
Bais vs Variance
The relationship between bias and variance in
machine learning.

• Increasing the bias will decrease the variance.

• Increasing the variance will decrease the bias.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Unit I
No ratings yet
Unit I
14 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
TYPES OF SUPERVISED LEARNING2
No ratings yet
TYPES OF SUPERVISED LEARNING2
66 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
AAI Lecture 10 Sp 25
No ratings yet
AAI Lecture 10 Sp 25
37 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Machine learning
No ratings yet
Machine learning
62 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
Week 7. Intro to ML. Regression
No ratings yet
Week 7. Intro to ML. Regression
24 pages
Unit 2
No ratings yet
Unit 2
67 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Day2
No ratings yet
Day2
52 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
machine learning
No ratings yet
machine learning
37 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ARTIFICIAL INTELLIGENCE LEC 4
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 4
13 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
Module 5
No ratings yet
Module 5
48 pages
Data Science
No ratings yet
Data Science
5 pages
Jntuk Machine Learning 3-2 Unit-2
No ratings yet
Jntuk Machine Learning 3-2 Unit-2
47 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Unit 2
No ratings yet
Unit 2
19 pages
AI ML 3
No ratings yet
AI ML 3
27 pages
ML points
No ratings yet
ML points
13 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
class 3 - classification
No ratings yet
class 3 - classification
80 pages
5.REGRESSION-1
No ratings yet
5.REGRESSION-1
46 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Slide 1
No ratings yet
Slide 1
29 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
AI 4 Unit Notes
No ratings yet
AI 4 Unit Notes
47 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Mathematics Parachutes
No ratings yet
Mathematics Parachutes
13 pages
4 - Polynomials (Remainder and Factor Theorem) - 230130 - 172959
No ratings yet
4 - Polynomials (Remainder and Factor Theorem) - 230130 - 172959
11 pages
WEEK 5 Math7 COT Lesson Plan
No ratings yet
WEEK 5 Math7 COT Lesson Plan
7 pages
11 Plus (11+) Maths - Algebra Dependent Problems - Past Paper Questions - Page 4 of 5 - Pi Academy
No ratings yet
11 Plus (11+) Maths - Algebra Dependent Problems - Past Paper Questions - Page 4 of 5 - Pi Academy
8 pages
RBC Statistics Overview RBC
No ratings yet
RBC Statistics Overview RBC
31 pages
APDISE1Lecture6 Sequences
No ratings yet
APDISE1Lecture6 Sequences
15 pages
(Oct. 13, Part 1) Individual Risk Model
No ratings yet
(Oct. 13, Part 1) Individual Risk Model
16 pages
Computational Fluid Dynamics: Chemical Engineering
No ratings yet
Computational Fluid Dynamics: Chemical Engineering
1 page
Chapter 4
No ratings yet
Chapter 4
70 pages
Continous Time Signal Vs Discrete Time Signal
No ratings yet
Continous Time Signal Vs Discrete Time Signal
9 pages
kouadio & gakpa
No ratings yet
kouadio & gakpa
12 pages
Jackson 4 8 Homework Solution PDF
No ratings yet
Jackson 4 8 Homework Solution PDF
6 pages
Motion in A Straight Line Worksheet
100% (1)
Motion in A Straight Line Worksheet
2 pages
HP48 Frequently Asked Questions List (FAQ) Appendix B GX Specific Information
No ratings yet
HP48 Frequently Asked Questions List (FAQ) Appendix B GX Specific Information
12 pages
Using Adams/Tire - MD Adams 2010
0% (1)
Using Adams/Tire - MD Adams 2010
547 pages
Notes - EDA-Unit1 (2)
No ratings yet
Notes - EDA-Unit1 (2)
34 pages
Subject Verb Agreement - Pre Test
No ratings yet
Subject Verb Agreement - Pre Test
1 page
Statistical Mechanics
No ratings yet
Statistical Mechanics
15 pages
Classifying Real Numbers
No ratings yet
Classifying Real Numbers
22 pages
DLP - Arithmetic Sequence
No ratings yet
DLP - Arithmetic Sequence
10 pages
Comptech Cheat Sheet
No ratings yet
Comptech Cheat Sheet
2 pages
CS 326 A: Motion Planning
No ratings yet
CS 326 A: Motion Planning
47 pages
PDF Level Sets and Extrema of Random Processes and Fields 1st Edition Jean-Marc Azais download
100% (9)
PDF Level Sets and Extrema of Random Processes and Fields 1st Edition Jean-Marc Azais download
67 pages
Sons - 2002 LSE Linear Algebra MA201
No ratings yet
Sons - 2002 LSE Linear Algebra MA201
15 pages
The Law of Sines
No ratings yet
The Law of Sines
13 pages
Quanti Finals Zara 2
No ratings yet
Quanti Finals Zara 2
9 pages
Download Complete Quantum Metrology, Imaging, and Communication 1st Edition David S. Simon PDF for All Chapters
100% (3)
Download Complete Quantum Metrology, Imaging, and Communication 1st Edition David S. Simon PDF for All Chapters
55 pages
Motion in Space Velocity and Acceleration
No ratings yet
Motion in Space Velocity and Acceleration
29 pages
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
108 pages
ib 分析方法 (高阶) mathaa hl学霸笔记
No ratings yet
ib 分析方法 (高阶) mathaa hl学霸笔记
137 pages

Uploaded by

Uploaded by

Forecasting and learning theory

– Thus, find the line which minimizes the sum of the

• Why least squares?

Results are categorical

• Are the features nominal or continuous?

• In supervised machine learning an algorithm learns

• Bias are the simplifying assumptions made by a model to make

• Examples of high-bias machine learning

• Variance is the amount that the estimate of the

• Examples of high-variance machine learning

• Increasing the bias will decrease the variance.

You might also like