0% found this document useful (0 votes)
9 views

Unit 2 - ML - SRM

The document discusses various statistical methods for parameter estimation and classification, including Maximum Likelihood Estimation (MLE), Least Squares Method, Robust Linear Regression, Ridge Regression, and Bayesian Linear Regression. It highlights the applications of these methods in linear models for classification, such as discriminant functions and logistic regression, while also addressing their limitations and advantages. Key concepts such as probabilistic generative and discriminative models, Laplace approximation, and Bayesian inference are also covered.

Uploaded by

aravgoel04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit 2 - ML - SRM

The document discusses various statistical methods for parameter estimation and classification, including Maximum Likelihood Estimation (MLE), Least Squares Method, Robust Linear Regression, Ridge Regression, and Bayesian Linear Regression. It highlights the applications of these methods in linear models for classification, such as discriminant functions and logistic regression, while also addressing their limitations and advantages. Key concepts such as probabilistic generative and discriminative models, Laplace approximation, and Bayesian inference are also covered.

Uploaded by

aravgoel04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Unit - 2

Maximum likelihood estimation – least squares, robust linear


expression, ridge regression, Bayesian linear regression.
Linear models for classification: Discriminant function –
Probabilistic generative models, Probabilistic discriminative
models, Laplacian approximation, Bayesian logistic regression,
Kernels functions, using kernels in GLMs, Kernel trick, SVMs.
Maximum Likelihood Estimation
Maximum Likelihood Estimation
(MLE)
• Maximum Likelihood Estimation (MLE) is a method used to estimate
the parameters of a statistical model. In the context of linear
regression, MLE is used to estimate the coefficients (parameters) of
the regression model.
Maximum
Likelihood
Estimation
Maximum Likelihood Estimation
• Fig 1 shows multiple attempts at fitting the PDF bell curve over the
random sample data. Red bell curves indicate poorly fitted PDF and
the green bell curve shows the best fitting PDF over the data. We
obtained the optimum bell curve by checking the values in Maximum
Likelihood Estimate plot corresponding to each PDF.

• As observed in Fig 1, the red plots poorly fit the normal distribution,
hence their ‘likelihood estimate’ is also lower. The green PDF curve
has the maximum likelihood estimate as it fits the data perfectly. This
is how the maximum likelihood estimate method works.
Least Square Method

• Least Square Method: In statistics, when we have data in the form of


data points that can be represented on a cartesian plane by taking
one of the variables as the independent variable represented as the x-
coordinate and the other one as the dependent variable represented
as the y-coordinate, it is called scatter data.
• This data might not be useful in making interpretations or predicting
the values of the dependent variable for the independent variable
where it is initially unknown. So, we try to get an equation of a line
that fits best to the given data points with the help of the Least
Square Method.
Least
Square
Method
Formula
for Least
Square
Method
Least
Square
Method
Graph
Example - Maximum likelihood estimation
Let's work through an example of using Maximum Likelihood
Estimation (MLE) in linear regression.
• Example
• Suppose we have a dataset with the following observations:
X Y
1 3
2 4
4 8
6 10
8 15
SOLUTION
• Here, we have x as the independent variable and y as the dependent
variable. First, we calculate the means of x and y values denoted by X
and Y respectively.

• X = (1+2+4+6+8)/5 = 4.2

• Y = (3+4+8+10+15)/5 = 8
SOLUTIO
N
SOLUTION
• The slope of the line of best fit can be calculated from the formula as follows:

• m = (Σ (X – xi)*(Y – yi)) /Σ(X – xi)2

• m = 55/32.8 = 1.68 (rounded upto 2 decimal places)

• Now, the intercept will be calculated from the formula as follows:

• c = Y – mX

• c = 8 – 1.68*4.2 = 0.94

• Thus, the equation of the line of best fit becomes, y = 1.68x + 0.94.
Problem 2:
• Find the line of best fit for the following data of heights and weights
of students of a school using the least squares method:

• Height (in centimeters): [160, 162, 164, 166, 168]


• Weight (in kilograms): [52, 55, 57, 60, 61]
Drawback in Maximum Likelihood
Estimation
• While Maximum Likelihood Estimation (MLE) is a powerful and widely
used method for parameter estimation, it does have some drawbacks
and limitations. Here are some of the key challenges and potential
issues associated with MLE:
• Sensitivity to Outliers
• Computational Complexity
• Overfitting
• Bias in Small Samples
Robust linear expression
• Robust linear regression is designed to be less sensitive to outliers compared to
traditional linear regression.
• Traditional linear regression minimizes the sum of squared residuals, which can
be heavily influenced by outliers.
• Robust linear regression uses different techniques to mitigate the effect of
outliers and produce a more reliable model.
• The Least Absolute Deviations (LAD) method, also known as L1 Regression or
Least Absolute Errors (LAE), is a type of regression that minimizes the sum of
the absolute differences (or deviations) between the observed and predicted
values. This approach is particularly useful when the data contains outliers or
when you want to minimize the influence of large residuals, which can
disproportionately affect models like Ordinary Least Squares (OLS) that minimize
squared differences.
Robust linear expression


Ridge Regression
Bayesian Linear
Regression
Bayes Theorem
what is the use of bayesian
linear regression in machine
learning
• Bayesian Linear Regression is used in machine learning for several
important reasons, particularly when dealing with uncertainty, small
datasets, or when prior knowledge about the parameters is available.
• Bayesian linear regression extends the traditional linear regression
framework by incorporating prior beliefs about the parameters,
resulting in a full posterior distribution for the parameters. This
approach offers several advantages, including better handling of
uncertainty and robustness to overfitting.
Key Concepts of Bayesian Linear
Regression
Linear models for classification
• Discriminant function
• Probabilistic generative models,
• Probabilistic discriminative models
• Laplacian approximation
• Bayesian logistic regression
Linear Models for Classification: Discriminant Function

• Linear models are commonly used for classification tasks, where the
goal is to assign inputs (data points) to one of several possible classes.
A discriminant function is a type of linear model used to separate
different classes by defining a decision boundary.
• Example: Classifying Points Based on Two Features
Problem Setup:
• Suppose we have a dataset with two features (x1 and x2​) and two classes
(Class 1 and Class 2). The goal is to classify a new data point into either
Class 1 or Class 2 using a linear discriminant function.
• Here’s a small dataset:
X1 X2 Class
2 3 ?
3 4 ?
4 2 ?
5 3 ?
Probabilistic Generative Models
Probabilistic Generative Models

• Probabilistic generative models are a fundamental approach to


classification in machine learning. These models work by modeling
the joint probability distribution of the input features and the output
labels. Once this joint distribution is known, the model can predict the
probability of each class given the input features using Bayes'
theorem.
Probabilistic Discriminative Models
Probabilistic Discriminative Models
• Probabilistic discriminative models are used in classification tasks to predict
the class label of a given input based on the input features. These models
are called "discriminative" because they focus on distinguishing between
classes directly, without trying to model the distribution of the features
themselves.
Laplace Approximation
In the context of linear models for classification, Laplace Approximation is
often used to approximate the posterior distribution of model parameters
when performing Bayesian inference.
Laplace Approximation…

One major weakness of the Laplace approximation is that, since it is based on a Gaussian
distribution, it is only directly applicable to real variables.
In other cases it may be possible to apply the Laplace approximation to a transformation of the
variable. For instance if 0 <T < ∞ then we can consider a Laplace approximation of ln τ .
The most serious limitation of the Laplace framework, however, is that it is based purely on the
aspects of the true distribution at a specific value of the variable, and so can fail to capture important
global properties.
Bayesian Logistic Regression
Logistic Regression
• Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, etc.
• Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
• Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The below image is showing the logistic function:
Logistic Function (Sigmoid Function)
Bayesian Logistic Regression
• Bayesian Logistic Regression is an extension of logistic regression where we
apply Bayesian inference to estimate the model parameters.
• It combines logistic regression with Bayesian inference.
• In contrast to traditional logistic regression, which provides point estimates
for the parameters, Bayesian logistic regression treats the parameters as
random variables and provides a probability distribution over them.
• This approach allows us to quantify the uncertainty in the parameter
estimates and make probabilistic predictions.
Example Scenario
• Imagine you're working on a medical diagnosis problem where you
need to predict whether a patient has a particular disease (binary
outcome: 0 for no, 1 for yes) based on some clinical measurements
(features). If you have prior knowledge about the likely effect of these
measurements on the disease (say from previous studies), Bayesian
Logistic Regression allows you to incorporate this knowledge and
update it as more patient data becomes available.
• For instance, if a certain measurement is known to be positively
correlated with the disease but you're uncertain about the strength of
this relationship, you might use a prior that reflects this belief. As you
collect more patient data, the posterior distribution will reflect both
the prior knowledge and the new data, giving you a more refined
estimate.

You might also like