02 Regression and Classification Problems
02 Regression and Classification Problems
The expression multivariate analysis is used to describe analyses of data that are
multivariate in the sense that numerous observations or variables are obtained for each
individual or unit studied.
Regression Problems:
• Supervised learning problems where the output is a continuous value are called as
regression problems.
• The Regression technique is used for predicting a continuous value.
• For example, predicting things like the price of a house based on its characteristics,
or to estimate the Co2 emission from a car’s engine, etc.
Root Mean Squared Error (RMSE) is the square root of the mean squared error. This
is one of the most popular of the evaluation metrics because root mean squared error is
interpretable in the same units as the response vector or y units, making it easy to relate
its information.
Classification Problems:
• The problems where the output is a discrete value are called as classification
problems.
• Classification is the process of predicting a discrete class label, or categories.
• For example, if a cell is benign or malignant, if an email is spam or not.
• The classification problem not necessarily has only two outcomes, which means
it isn’t limited to two classes. For example, the problem of handwritten digit
recognition (that is a classification problem) has ten outcome
Logistic Regression
Logistic regression is a classification algorithm designed to predict categorical target
labels based on historical feature data. It allows us to predict the probability of a
dependent variable given an input, and a model. Logistic regression can be used for both
binary classification and multi-class classification.
Sigmoid Function
Logistic Regression uses the sigmoid function also known as the logistic function to
perform classification. The sigmoid function takes in any value and map it into a
value between 0 and 1. The key thing to notice here is that it doesn’t matter what value
of y you put into the logistics or the sigmoid function you’ll always get a value between
0 and 1. This means we can take our linear regression solution and place it into
the sigmoid function and it looks something like this:
• We can formulate the algorithm for predicting the class of the new object x with
the predictors (x1, x2, ..., x𝑝�) once the coefficients β0, β1, ..., β𝑝� are found.
1. Calculate the value 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥 2 + ⋯ + 𝛽𝑝 𝑥 𝑝
2. Calculate the probability P:
3. If P ≥ 0.5, the object x will fall into the class 1 or 0 otherwise.
(In practice, the choice of a probability cut-off is up to a researcher)