0% found this document useful (0 votes)

9 views

Unit 2 - ML - SRM

The document discusses various statistical methods for parameter estimation and classification, including Maximum Likelihood Estimation (MLE), Least Squares Method, Robust Linear Regression, Ridge Regression, and Bayesian Linear Regression. It highlights the applications of these methods in linear models for classification, such as discriminant functions and logistic regression, while also addressing their limitations and advantages. Key concepts such as probabilistic generative and discriminative models, Laplace approximation, and Bayesian inference are also covered.

Uploaded by

aravgoel04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Unit 2 - ML - SRM

Uploaded by

aravgoel04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 66

Unit - 2

Maximum likelihood estimation – least squares, robust linear

expression, ridge regression, Bayesian linear regression.
Linear models for classification: Discriminant function –
Probabilistic generative models, Probabilistic discriminative
models, Laplacian approximation, Bayesian logistic regression,
Kernels functions, using kernels in GLMs, Kernel trick, SVMs.
Maximum Likelihood Estimation
Maximum Likelihood Estimation
(MLE)
• Maximum Likelihood Estimation (MLE) is a method used to estimate
the parameters of a statistical model. In the context of linear
regression, MLE is used to estimate the coefficients (parameters) of
the regression model.
Maximum
Likelihood
Estimation
Maximum Likelihood Estimation
• Fig 1 shows multiple attempts at fitting the PDF bell curve over the
random sample data. Red bell curves indicate poorly fitted PDF and
the green bell curve shows the best fitting PDF over the data. We
obtained the optimum bell curve by checking the values in Maximum
Likelihood Estimate plot corresponding to each PDF.

• As observed in Fig 1, the red plots poorly fit the normal distribution,
hence their ‘likelihood estimate’ is also lower. The green PDF curve
has the maximum likelihood estimate as it fits the data perfectly. This
is how the maximum likelihood estimate method works.
Least Square Method

• Least Square Method: In statistics, when we have data in the form of

data points that can be represented on a cartesian plane by taking
one of the variables as the independent variable represented as the x-
coordinate and the other one as the dependent variable represented
as the y-coordinate, it is called scatter data.
• This data might not be useful in making interpretations or predicting
the values of the dependent variable for the independent variable
where it is initially unknown. So, we try to get an equation of a line
that fits best to the given data points with the help of the Least
Square Method.
Least
Square
Method
Formula
for Least
Square
Method
Least
Square
Method
Graph
Example - Maximum likelihood estimation
Let's work through an example of using Maximum Likelihood
Estimation (MLE) in linear regression.
• Example
• Suppose we have a dataset with the following observations:
X Y
1 3
2 4
4 8
6 10
8 15
SOLUTION
• Here, we have x as the independent variable and y as the dependent
variable. First, we calculate the means of x and y values denoted by X
and Y respectively.

• X = (1+2+4+6+8)/5 = 4.2

• Y = (3+4+8+10+15)/5 = 8
SOLUTIO
N
SOLUTION
• The slope of the line of best fit can be calculated from the formula as follows:

• m = (Σ (X – xi)*(Y – yi)) /Σ(X – xi)2

• m = 55/32.8 = 1.68 (rounded upto 2 decimal places)

• Now, the intercept will be calculated from the formula as follows:

• c = Y – mX

• c = 8 – 1.68*4.2 = 0.94

• Thus, the equation of the line of best fit becomes, y = 1.68x + 0.94.
Problem 2:
• Find the line of best fit for the following data of heights and weights
of students of a school using the least squares method:

• Height (in centimeters): [160, 162, 164, 166, 168]

• Weight (in kilograms): [52, 55, 57, 60, 61]
Drawback in Maximum Likelihood
Estimation
• While Maximum Likelihood Estimation (MLE) is a powerful and widely
used method for parameter estimation, it does have some drawbacks
and limitations. Here are some of the key challenges and potential
issues associated with MLE:
• Sensitivity to Outliers
• Computational Complexity
• Overfitting
• Bias in Small Samples
Robust linear expression
• Robust linear regression is designed to be less sensitive to outliers compared to
traditional linear regression.
• Traditional linear regression minimizes the sum of squared residuals, which can
be heavily influenced by outliers.
• Robust linear regression uses different techniques to mitigate the effect of
outliers and produce a more reliable model.
• The Least Absolute Deviations (LAD) method, also known as L1 Regression or
Least Absolute Errors (LAE), is a type of regression that minimizes the sum of
the absolute differences (or deviations) between the observed and predicted
values. This approach is particularly useful when the data contains outliers or
when you want to minimize the influence of large residuals, which can
disproportionately affect models like Ordinary Least Squares (OLS) that minimize
squared differences.
Robust linear expression


Ridge Regression
Bayesian Linear
Regression
Bayes Theorem
what is the use of bayesian
linear regression in machine
learning
• Bayesian Linear Regression is used in machine learning for several
important reasons, particularly when dealing with uncertainty, small
datasets, or when prior knowledge about the parameters is available.
• Bayesian linear regression extends the traditional linear regression
framework by incorporating prior beliefs about the parameters,
resulting in a full posterior distribution for the parameters. This
approach offers several advantages, including better handling of
uncertainty and robustness to overfitting.
Key Concepts of Bayesian Linear
Regression
Linear models for classification
• Discriminant function
• Probabilistic generative models,
• Probabilistic discriminative models
• Laplacian approximation
• Bayesian logistic regression
Linear Models for Classification: Discriminant Function

• Linear models are commonly used for classification tasks, where the
goal is to assign inputs (data points) to one of several possible classes.
A discriminant function is a type of linear model used to separate
different classes by defining a decision boundary.
• Example: Classifying Points Based on Two Features
Problem Setup:
• Suppose we have a dataset with two features (x1 and x2) and two classes
(Class 1 and Class 2). The goal is to classify a new data point into either
Class 1 or Class 2 using a linear discriminant function.
• Here’s a small dataset:
X1 X2 Class
2 3 ?
3 4 ?
4 2 ?
5 3 ?
Probabilistic Generative Models
Probabilistic Generative Models

• Probabilistic generative models are a fundamental approach to

classification in machine learning. These models work by modeling
the joint probability distribution of the input features and the output
labels. Once this joint distribution is known, the model can predict the
probability of each class given the input features using Bayes'
theorem.
Probabilistic Discriminative Models
Probabilistic Discriminative Models
• Probabilistic discriminative models are used in classification tasks to predict
the class label of a given input based on the input features. These models
are called "discriminative" because they focus on distinguishing between
classes directly, without trying to model the distribution of the features
themselves.
Laplace Approximation
In the context of linear models for classification, Laplace Approximation is
often used to approximate the posterior distribution of model parameters
when performing Bayesian inference.
Laplace Approximation…

One major weakness of the Laplace approximation is that, since it is based on a Gaussian
distribution, it is only directly applicable to real variables.
In other cases it may be possible to apply the Laplace approximation to a transformation of the
variable. For instance if 0 <T < ∞ then we can consider a Laplace approximation of ln τ .
The most serious limitation of the Laplace framework, however, is that it is based purely on the
aspects of the true distribution at a specific value of the variable, and so can fail to capture important
global properties.
Bayesian Logistic Regression
Logistic Regression
• Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, etc.
• Logistic Regression is a significant machine learning algorithm because it has the
ability to provide probabilities and classify new data using continuous and discrete
datasets.
• Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The below image is showing the logistic function:
Logistic Function (Sigmoid Function)
Bayesian Logistic Regression
• Bayesian Logistic Regression is an extension of logistic regression where we
apply Bayesian inference to estimate the model parameters.
• It combines logistic regression with Bayesian inference.
• In contrast to traditional logistic regression, which provides point estimates
for the parameters, Bayesian logistic regression treats the parameters as
random variables and provides a probability distribution over them.
• This approach allows us to quantify the uncertainty in the parameter
estimates and make probabilistic predictions.
Example Scenario
• Imagine you're working on a medical diagnosis problem where you
need to predict whether a patient has a particular disease (binary
outcome: 0 for no, 1 for yes) based on some clinical measurements
(features). If you have prior knowledge about the likely effect of these
measurements on the disease (say from previous studies), Bayesian
Logistic Regression allows you to incorporate this knowledge and
update it as more patient data becomes available.
• For instance, if a certain measurement is known to be positively
correlated with the disease but you're uncertain about the strength of
this relationship, you might use a prior that reflects this belief. As you
collect more patient data, the posterior distribution will reflect both
the prior knowledge and the new data, giving you a more refined
estimate.

Introduction To Generative AI - Pre Quiz - Attempt Review
100% (1)
Introduction To Generative AI - Pre Quiz - Attempt Review
4 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
unit-2.pptx
No ratings yet
unit-2.pptx
133 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
UNIT 2 Machine Learning BCAI601BCDS062.pptx
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062.pptx
244 pages
Unit2-Regression NGP
No ratings yet
Unit2-Regression NGP
81 pages
Unit-2
No ratings yet
Unit-2
26 pages
Linear Regression and Classification
No ratings yet
Linear Regression and Classification
8 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
MFML Unit-4 Notes - 14-06-2024
No ratings yet
MFML Unit-4 Notes - 14-06-2024
36 pages
DA UNIT-III
No ratings yet
DA UNIT-III
14 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Unit 2 ML_Ver 2
No ratings yet
Unit 2 ML_Ver 2
129 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Unit 2&3_250421_215911
No ratings yet
Unit 2&3_250421_215911
60 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Bayesian linear regression for Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian linear regression for Posterior Predictive Distribution MATLAB
46 pages
output_23
No ratings yet
output_23
6 pages
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
No ratings yet
Advice: sciences/business/economics/kit-baum-workshops/Bham13P4slides PDF
11 pages
UNIT-3-1
No ratings yet
UNIT-3-1
41 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
fileml
No ratings yet
fileml
54 pages
LEC2 مشين
No ratings yet
LEC2 مشين
116 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
No ratings yet
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
31 pages
output_25
No ratings yet
output_25
8 pages
Final Ml
No ratings yet
Final Ml
54 pages
Notes 05
No ratings yet
Notes 05
51 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Unit III
No ratings yet
Unit III
18 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
ML-1
No ratings yet
ML-1
24 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
7 pages
Lecture 02 (3hrs) Linear Regression and Logistic Regression
No ratings yet
Lecture 02 (3hrs) Linear Regression and Logistic Regression
42 pages
UNIT- III SUPERVISIED LEARNING_NOTES
No ratings yet
UNIT- III SUPERVISIED LEARNING_NOTES
42 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
13
No ratings yet
13
4 pages
Logistic Regression[2]
No ratings yet
Logistic Regression[2]
36 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
5_AML Lecture 5_Linear regression
No ratings yet
5_AML Lecture 5_Linear regression
56 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
AI & ML_SLM
No ratings yet
AI & ML_SLM
87 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
BookSlides 11 The Art of Machine Learning For Predictive Data Analytics
No ratings yet
BookSlides 11 The Art of Machine Learning For Predictive Data Analytics
27 pages
NLP Assignment-3 Solution
100% (1)
NLP Assignment-3 Solution
6 pages
Adversarial Machine Learning
No ratings yet
Adversarial Machine Learning
107 pages
M.sc. Information Technology Vide Item No. 6.2 N Sem. III IV
No ratings yet
M.sc. Information Technology Vide Item No. 6.2 N Sem. III IV
51 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Prior Model For Bridge Design
No ratings yet
Prior Model For Bridge Design
9 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Lecture # 1-2 Introduction To Gen AI
No ratings yet
Lecture # 1-2 Introduction To Gen AI
41 pages
NPTEL NLP Assignment 3
No ratings yet
NPTEL NLP Assignment 3
6 pages
Artificial Intelligence and Machine Learning: Subject Code: 21CS54 by Savitha Nagaraju Aiml Dept, Atme
No ratings yet
Artificial Intelligence and Machine Learning: Subject Code: 21CS54 by Savitha Nagaraju Aiml Dept, Atme
80 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
Snorkel: Rapid Training Data Creation With Weak Supervision
No ratings yet
Snorkel: Rapid Training Data Creation With Weak Supervision
17 pages
CS3491-Artificial-Intelligence-and-Machine-Learning-Two-Mark-Questions-1
No ratings yet
CS3491-Artificial-Intelligence-and-Machine-Learning-Two-Mark-Questions-1
23 pages
Introduction To Generative Models - Pre Quiz - Attempt Review
No ratings yet
Introduction To Generative Models - Pre Quiz - Attempt Review
4 pages
MLA-C01 AWS Certified Machine Learning Engineer - Associate Practice Questions
No ratings yet
MLA-C01 AWS Certified Machine Learning Engineer - Associate Practice Questions
17 pages
Lesson Plan -ML
No ratings yet
Lesson Plan -ML
12 pages
2106.14490v6
No ratings yet
2106.14490v6
30 pages
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
No ratings yet
Liu and Tuzel - 2016 - Coupled Generative Adversarial Networks
32 pages
Criminal Face Recognition Using GAN
No ratings yet
Criminal Face Recognition Using GAN
3 pages
01_Introduction_To_MachineVision
No ratings yet
01_Introduction_To_MachineVision
53 pages
Natural Language Processing A Machine Learning Perspective 1st Yue Zhang instant download
100% (1)
Natural Language Processing A Machine Learning Perspective 1st Yue Zhang instant download
88 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Regression and Classification
No ratings yet
Regression and Classification
26 pages
Generative
No ratings yet
Generative
19 pages
Discriminative Vs Generative Algorithms
No ratings yet
Discriminative Vs Generative Algorithms
3 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages

Uploaded by

Uploaded by

Unit - 2

Maximum likelihood estimation – least squares, robust linear

• Least Square Method: In statistics, when we have data in the form of

• m = (Σ (X – xi)*(Y – yi)) /Σ(X – xi)2

• m = 55/32.8 = 1.68 (rounded upto 2 decimal places)

• Now, the intercept will be calculated from the formula as follows:

• Height (in centimeters): [160, 162, 164, 166, 168]

• Probabilistic generative models are a fundamental approach to

You might also like