Forecasting and Learning Theory
Forecasting and Learning Theory
Regression
• In regression we are interested in input-output relationships
• Regression is the prediction of a numeric value.
• In classification, we seek to identify the categorical class Ck associate with a given
input vector x.
• In regression, we seek to identify (or estimate) a continuous variable y associated
with a given input vector x.
• In regression the output is continuous
– Function Approximation
• Many models could be used – Simplest is linear regression.
• When we have a single input attribute (x) and we want to use linear regression,
this is called simple linear regression.
• y is called the dependent variable.
• x is called the independent variable
• If we had multiple input attributes (e.g. x1, x2, x3, etc.) This would be called
multiple linear regression.
Regression examples
Linear regression
• Given an input x we would like to
compute an output y
• For example:
Y
- Predict height from age
- Predict Google’s price from
Yahoo’s price
- Predict distance from wall from
sensors
X
Linear regression
• Given an input x we would like to compute an
output y error
• In linear regression we assume that y and x are
related with the following equation:
Y
b0
What we are Observed values slope
trying to
predict (Independent
(dependent Y=b0+b1X+e variable)
variable)
where : X
e : error ,b0 :y intercept, b1 :slope
remember: Y is always continuous
objective function
• We will "fit" the points with a line (i.e. hyper-
plane)
• Which line should we use?
– Choose an objective function
– For simple linear regression we choose sum
squared error (SSE)
• S (predictedi – actuali)2 = S (residuei)2
• The most fundamental problem with decision trees is that they "overfit" the data and
hence do not provide good generalization. A solution to this problem is to prune the tree:
• But pruning the tree will always increase the error rate on the training set .
size i( N )
• Cost-complexity Pruning: leaf nodes . Each node in the tree can be classified in terms
of its impact on the cost-complexity if it were pruned. Nodes are successively pruned until
certain heuristics are satisfied.
• By pruning the nodes that are far too specific to the training set, it is hoped the tree will
have better generalization. In practice, we use techniques such as cross-validation and
held-out training data to better calibrate the generalization properties.
How to choose the right algorithm
• What are you trying to get out of this?
Step 1
• If you’re trying to predict or forecast a target value, then you
need to look into supervised learning.
• If not, then unsupervised learning is the place you want to be.
Step 2
• Is it a discrete value like Yes/No, 1/2/3, A/B/C, or
Red/Yellow/Black? If so, then you want to look into
classification.
• If the target value can take on a number of values, say any value
from 0.00 to 100.00, or -999 to 999, or + to -, then you need to
look into regression.
Step3
• Are you trying to fit your data into some
discrete groups? If so and that’s all you need,
you should look into clustering.
• Do you need to have some numerical estimate
of how strong the fit is into each group? If you
answer yes, then you probably should look
into a density estimation algorithm
What data do you have or can you collect?