0% found this document useful (0 votes)
11 views

Forecasting and Learning Theory

forecasting

Uploaded by

lovishh03.ssll
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Forecasting and Learning Theory

forecasting

Uploaded by

lovishh03.ssll
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Forecasting and learning theory

Regression
• In regression we are interested in input-output relationships
• Regression is the prediction of a numeric value.
• In classification, we seek to identify the categorical class Ck associate with a given
input vector x.
• In regression, we seek to identify (or estimate) a continuous variable y associated
with a given input vector x.
• In regression the output is continuous
– Function Approximation
• Many models could be used – Simplest is linear regression.
• When we have a single input attribute (x) and we want to use linear regression,
this is called simple linear regression.
• y is called the dependent variable.
• x is called the independent variable
• If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called
multiple linear regression.
Regression examples
Linear regression
• Given an input x we would like to
compute an output y
• For example:
Y
- Predict height from age
- Predict Google’s price from
Yahoo’s price
- Predict distance from wall from
sensors
X
Linear regression
• Given an input x we would like to compute an
output y error
• In linear regression we assume that y and x are
related with the following equation:
Y

b0
What we are Observed values slope
trying to
predict (Independent
(dependent Y=b0+b1X+e variable)
variable)

where : X
e : error ,b0 :y intercept, b1 :slope
remember: Y is always continuous
objective function
• We will "fit" the points with a line (i.e. hyper-
plane)
• Which line should we use?
– Choose an objective function
– For simple linear regression we choose sum
squared error (SSE)
• S (predictedi – actuali)2 = S (residuei)2

– Thus, find the line which minimizes the sum of the


squared residues (e.g. least squares)
Linear regression
Y
• Our goal is to estimate w from a training data
of <xi,yi> pairs
• Optimization goal: minimize squared error
(least squares):

• Why least squares?


- minimizes squared distance between
measurements and predicted line
- has a nice probabilistic interpretation
- the math is pretty
Example
Use of linear regression
• Risk analysis
• Forecasting sales
• Business domains.
Logistic Regression
Logistic Regression
• Logistic regression is a method used to predict
a dependent variable , given set of
independent variable such that the dependent
variable is categorical.
• Dependent variable(Y) :the response binary
variable holding values like 0or 1 , yes or no
• Independent variable (X): the predictor
variable used to predict the response variable
Logistic Regression
• Our Bank Manager wants to build a prediction
model to predict if a customer will payback the loan
• A statistician advised our Bank Manager to use
Logistic regression
• Why not use linear regression?
• Least squares regression can cause impossible
estimates such as probabilities that are less than
zero and greater than 1. So, when the predicted
value is measured as a probability, use Logistic
Regression
Logistic Regression
• But what if there is an outlier in the data. Things would get
pretty messy.
• To deal with outliers, Logistic Regression uses Sigmoid function.
• Really a technique for classification, not regression
• The idea of Logistic Regression is to find a relationship between
features and probability of particular outcome.
• we use Maximum Likelihood Estimation for parameter
estimation.
• The maximum likelihood estimate is that set of regression
coefficients for which the probability of getting the data we
have observed is maximum
Logistic regression
• logistic regression is a discriminative classifier
• A discriminative model, tries to learn to
distinguish the classes (perhaps without
learning much about them)
• Logistic regression algorithm also uses a linear
equation with independent predictors to
predict a value.
• Very fast.
Equation
Log(Y/1-Y)= c+b1X1+b2X2+….
Where
C:constant term which will be the probability of
an event happening when no other factors are
considered.
Y : is the probability of an event to happen which
you are trying to predict.
X1, X2 : are independent variables which
determine the occurrence of an event Y
Sigmoid curve

Results are categorical


Use of logistic regression
• Classification Problems
• Cyber security
• Image processing
Logistic regression Vs Linear regression
• The essential difference between these two is that Logistic
regression is used when the dependent variable is binary in
nature. In contrast, Linear regression is used when the
dependent variable is continuous.
• Nature of logistic regression is curve and nature of the linear
regression is linear.
• Linear regression requires to establish the linear relationship
among dependent and independent variable whereas it is not
necessary for logistic regression.
• Estimation method in logistic regression is maximum likely
hood estimation. In linear regression estimation method is
least square estimation method.
Regression Tree
Pruning

• The most fundamental problem with decision trees is that they "overfit" the data and
hence do not provide good generalization. A solution to this problem is to prune the tree:

• But pruning the tree will always increase the error rate on the training set .
 size   i( N )
• Cost-complexity Pruning: leaf nodes . Each node in the tree can be classified in terms
of its impact on the cost-complexity if it were pruned. Nodes are successively pruned until
certain heuristics are satisfied.
• By pruning the nodes that are far too specific to the training set, it is hoped the tree will
have better generalization. In practice, we use techniques such as cross-validation and
held-out training data to better calibrate the generalization properties.
How to choose the right algorithm
• What are you trying to get out of this?
Step 1
• If you’re trying to predict or forecast a target value, then you
need to look into supervised learning.
• If not, then unsupervised learning is the place you want to be.
Step 2
• Is it a discrete value like Yes/No, 1/2/3, A/B/C, or
Red/Yellow/Black? If so, then you want to look into
classification.
• If the target value can take on a number of values, say any value
from 0.00 to 100.00, or -999 to 999, or + to -, then you need to
look into regression.
Step3
• Are you trying to fit your data into some
discrete groups? If so and that’s all you need,
you should look into clustering.
• Do you need to have some numerical estimate
of how strong the fit is into each group? If you
answer yes, then you probably should look
into a density estimation algorithm
What data do you have or can you collect?

• Are the features nominal or continuous?


• Are there missing values in the features? If
there are missing values, why are there
missing values?
• Are there outliers in the data?
Overview of Bias and Variance

• In supervised machine learning an algorithm learns


a model from training data.
• The goal of any supervised machine learning
algorithm is to best estimate the mapping function
(f) for the output variable (Y) given the input data
(X).
• The mapping function is often called the target
function because it is the function that a given
supervised machine learning algorithm aims to
approximate.
Prediction error
• Anytime you have a difference between your
model and your measurements, you have an
error.
• The prediction error for any machine learning
algorithm can be broken down into three
parts:
• Bias Error
• Variance Error
• Irreducible Error
Irreducible error
• The irreducible error cannot be reduced
regardless of what algorithm is used.
• It is the error introduced from the chosen
framing of the problem and may be caused by
factors like unknown variables that influence
the mapping of the input variables to the
output variable.
Bias Error

• Bias are the simplifying assumptions made by a model to make


the target function easier to learn.
• Generally, parametric algorithms have a high bias making them
fast to learn and easier to understand but generally less
flexible.
• In turn, they have lower predictive performance on complex
problems that fail to meet the simplifying assumptions of the
algorithms bias.
• Low Bias: Suggests less assumptions about the form of the
target function.
• High-Bias: Suggests more assumptions about the form of the
target function.
Examples of Bias
• Examples of low-bias machine learning
algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.

• Examples of high-bias machine learning


algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.
Variance Error

• Variance is the amount that the estimate of the


target function will change if different training data
was used.
• The target function is estimated from the training
data by a machine learning algorithm, so we should
expect the algorithm to have some variance.
• Ideally, it should not change too much from one
training dataset to the next, meaning that the
algorithm is good at picking out the hidden
underlying mapping between the inputs and the
output variables.
• Machine learning algorithms that have a high
variance are strongly influenced by the
specifics of the training data.
• This means that the specifics of the training
have influences the number and types of
parameters used to characterize the mapping
function.
Low Variance vs High Variance
• Low Variance: Suggests small changes to the estimate
of the target function with changes to the training
dataset.
• High Variance: Suggests large changes to the estimate
of the target function with changes to the training
dataset.
• Generally, nonparametric machine learning algorithms
that have a lot of flexibility have a high variance.
• For example, decision trees have a high variance, that
is even higher if the trees are not pruned before use.
Examples
• Examples of low-variance machine learning
algorithms include: Linear Regression, Linear
Discriminant Analysis and Logistic Regression.

• Examples of high-variance machine learning


algorithms include: Decision Trees, k-Nearest
Neighbors and Support Vector Machines.
Bias-Variance Trade-Off
• The goal of any supervised machine learning
algorithm is to achieve low bias and low variance.
• In turn the algorithm should achieve good
prediction performance.
• Generally, Parametric or linear machine learning
algorithms often have a high bias but a low
variance.
• Generally, Non-parametric or non-linear machine
learning algorithms often have a low bias but a high
variance.
Figure 8.8 The bias variance tradeoff illustrated with test error
and training error. The training error is the top curve, which has
a minimum in the middle of the plot. In order to create the best
forecasts, we should adjust our model complexity where the
test error is at a minimum.
Handling Bias
• The k-nearest neighbors algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the value of k which increases the number of
neighbors that contribute t the prediction and in turn
increases the bias of the model.
• The support vector machine algorithm has low bias and
high variance, but the trade-off can be changed by
increasing the C parameter that influences the number
of violations of the margin allowed in the training data
which increases the bias but decreases the variance
Bais vs Variance
The relationship between bias and variance in
machine learning.

• Increasing the bias will decrease the variance.


• Increasing the variance will decrease the bias.

You might also like