0% found this document useful (0 votes)
401 views16 pages

AI & ML Unit 4 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
401 views16 pages

AI & ML Unit 4 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit – IV

Combining Multiple Learners:

Ensemble Methods:

• It is a machine learning technique that combines several base models in order to


produce one optimal predictive model.
• Decision tree is best to outline the definition of ensemble methods.
• A Decision tree determines the predictive value based on series of questions and
conditions.
• For example, simple decision tree determining on whether an individual should
play outside or not.
• The tree takes several weather factors into account, and given each factor either
males a decision or asks another question.
• In this example, every time it is overcast.
• However if it is raining, it is needed to ask if it is windy or not? If windy, do not
play.
• But given no wind, so ready to go outside to play.
Simple Ensemble Techniques:

• Max Voting
• Averaging
• Weighted Averaging

Max-Voting:

• The max-voting method is generally used for classification problems.


• In this technique, multiple models are used to make predictions for each data point.
• The predictions by each model are considered as a ‘Vote’.
• The predictions which we get from the majority of models are used as final
prediction.
• For example, when asked 5 of our colleagues to rate movie(out of 5).
• Since the majority gave a rating of 4, the final rating will be taken as 4.
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating
5 4 5 4 4 4
Averaging:

• Similar to max-voting technique, multiple predictions are made for each data point
in averaging.
• In this method, the average is taken of predictions from all the models and use it to
make final prediction.
• It can be used for making predictions in regression problems.
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating
5 4 5 4 4 4.4
Weighted Average:

• This is an extension of averaging method.


• All models are assigned different weights defining the importance of each model
for prediction.
• The result is calculated as [(5 x 0.23) + (4 x 0.23) + (5 x 0.18) + (4 x 0,18) + (4 x
0,18)] = 4.41
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating
Weight 0.23 0.23 0.18 0.18 0.18
Rating 5 4 5 4 4 4.41

Ensemble Learning:
Bagging:
• The idea behind bagging is combining the results of multiple models to get a
generalized result.
• Bagging, the short form for bootstrap aggregating, is mainly applied in
classification and regression.
• It increases the accuracy of models through decision trees.
• It is mainly applied in supervised learning problems such as classification and
regression.
• It involves two steps, i.e., bootstrapping and aggregation,
• Bootstrapping: It is a random sampling technique in which samples are derived
from the data using the replacement procedure.
• Aggregation: It is in bagging is done to incorporate all possible outcomes of the
prediction and randomize the outcome.
Advantages:
✓ Eliminates variance
✓ Weak base learners are combined to form single strong learner

Boosting:

• Boosting is an ensemble technique that learns from previous predictor mistakes


to make better predictions in the future.
• The technique combines several weak base learners to form one strong learner.
• Boosting works by arranging weak learners in a sequence, such that weak learners
learn from the next learner in the sequence to create better predictive models .
• It is a sequential process, where each subsequent model attempts to correct the
errors of previous model.
• Boosting takes many forms, including gradient boosting, Adaptive Boosting
(AdaBoost), and XGBoost (Extreme Gradient Boosting).
• AdaBoost uses weak learners in the form of decision trees, which mostly include
one split that is popularly known as decision stumps
• Gradient boosting adds predictors sequentially to the ensemble, where preceding
predictors correct their successors, thereby increasing the model’s accuracy.
• The way of boosting works in the below steps:
Step 1: A subset is created from original dataset.
Step 2: Initially, all data points are given equal weights.
Step 3: A base model is created on this subset.
Step 4: This model is used to make predictions on the whole dataset.
Step 5: Errors are calculated using actual and predicted values.
Step 6: Observations which are incorrectly predicted are given higher weights
Step 7: Another model is created and predictions are made on dataset

Step 8: Similarly, multiple models are created, each correcting the errors
Step 9: The final model is the weighted mean of all models

Step 10: Thus boosting algorithm combines a number of weak learners to form a
strong learner.

Stacking:

• Stacking, another ensemble method is often referred to as stacked generalization.


• This technique works by allowing a training algorithm to ensemble several other
similar learning algorithm predictions.
• Stacking has been successfully implemented in regression, density estimations,
distance learning, and classifications.
• It can also be used to measure the error rate involved during bagging.
• Step-wise explanation for simple-stacked ensemble:
Step 1: The train set is split into 10 parts.

Step 2: A base model is fitted on 9 parts and predictions are made for 10th part.

Step 3: The base model is then fitted on the whole train dataset. Using this model,
predictions are make on the test set.

Step 4: Steps 2 to 4 are repeated for another base model resulting in another set
of predictions for train set and test set.

Step 5: The predictions from the train set are used as features to build a new
model.
Unsupervised Learning:

• Unsupervised learning is a learning method in which a machine learns without


supervision.
• It is machine learning technique in which models are not supervised using
training dataset.
• It cannot be directly applied to regression or classification.
• The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities and represent that dataset in compressed
format.

Types of unsupervised learning:


Clustering:
o It is unsupervised method of grouping objects into clusters such that
objects with most similarities remains into a group and has less or no
similarities of another group.

Association:

• An association rule is an unsupervised learning method which finds the


relationships between variables in large database.
List of unsupervised learning algorithm:

Clustering:

• K-means clustering
• KNN(K-nearest neighbor)
• Gaussian Mixture model
• Expectation Maximization

Association:

• Apriori algorithm
• FP growth algorithm

Applications:

• Market Basket Analysis


• Medical Diagnosis
• Marketing
• Insurance

Advantages:

• Used for more complex tasks


• It is easy to get unlabeled data

Disadvantages:

• More difficult than supervised learning.

K-Means clustering algorithm:

• K-Means Clustering is an unsupervised learning algorithm that is used to solve


the clustering problems in machine learning or data science.
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters.
• Here K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
• It is a centroid-based algorithm, where each cluster is associated with a centroid.
• The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best clusters.
• The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative
process.
o Assigns each data point to its closest k-center

Working:

Step-1: Select the number K to decide the number of clusters

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which mean reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready


• Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below: Let's take number k of clusters, i.e., K=2, to identify the
dataset and to put them into different clusters. It means here we will try to group
these datasets into two different clusters.
• We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point.

Applications:

• Academic performance
• Diagnostic systems
• Search engines

Advantages:

• Simple
• Easy to implement

K-Nearest Neighbor(KNN) Algorithm:

• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based


on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
• K-NN algorithm stores all the available data and classifies a new data point based
on the similarity.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training
set immediately.
• Example: Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this identification, we
can use the KNN algorithm, as it works on a similarity measure. Our KNN model
will find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
• Suppose there are two categories, i.e., Category A and Category B, and we have
a new data point x1, so this data point will lie in which of these categories. To
solve this type of problem, we need a K-NN algorithm.

Working:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.

Step-6: Our model is ready.

• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points.

Advantages:

• Simple to implement
• Robust
• More effective to train data
Disadvantage:

• Computation cost is high

Instance Based Learning:

• These systems that learn the training examples by heart and then generalizes to
new instances based on some similarity measure.
• It builds the hypotheses from the training instances.
• It is also known as Memory-based learning or lazy learning.

Gaussian Mixture Models:

• Gaussian mixture models (GMMs) are a type of machine learning algorithm.


• They are used to classify data into different categories based on the probability
distribution.
• Gaussian mixture models can be used in many different areas, including finance,
marketing and so much more.
• Gaussian Mixture Models (GMMs) give us more flexibility than K-Means.
• Taking an example in two dimensions, this means that the clusters can take any
kind of elliptical shape (since we have standard deviation in both the x and y
directions).
• Thus, each Gaussian distribution is assigned to a single cluster. In order to find
the parameters of the Gaussian for each cluster (e.g the mean and standard
deviation) we will use an optimization algorithm called Expectation–
Maximization (EM).
• Gaussian mixture models (GMM) are a probabilistic concept used to model real-
world data sets.
• GMMs are a generalization of Gaussian distributions and can be used to represent
any data set that can be clustered into multiple Gaussian distributions
• The Gaussian mixture model is a probabilistic model that assumes all the data
points are generated from a mix of Gaussian distributions with unknown
parameters.
• A Gaussian mixture model can be used for clustering, which is the task of
grouping a set of data points into clusters.
• GMMs can be used to find clusters in data sets where the clusters may not be
clearly defined.
• This makes GMMs a flexible and powerful tool for clustering data.
• GMM has many applications, such as density estimation, clustering, and image
segmentation. For density estimation, GMM can be used to estimate the
probability density function of a set of data points.
• For clustering, GMM can be used to group together data points that come from
the same Gaussian distribution.
• Image segmentation, GMM can be used to partition an image into different
regions.
• Gaussian mixture models can be used for a variety of use cases, including
identifying customer segments.

Advantages:

• Flexibility
• Robustness
• Speed
Disadvantages:

• Sensitivity to initialization
• High dimensional data.

Expectation Maximization:

• It is defined as the combination of various unsupervised machine learning


algorithms such as K-means clustering algorithm.
• Being an iterative approach, it consists of two modes.
• In the first mode, estimate the missing or latent variables.
• Hence it is referred to as the Expectation step(E-step)
• Other mode is used to optimize the parameters of the models so that it can explain
the data more clearly.
• The second mode is known as Maximization step(M-step)
• Expectation step: It involves the estimation of all missing values in dataset so
that after completing this step, there should not be any missing value.
• Maximization step: It involves the use of estimated data in the E-step and
updating the parameters.

• The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data.
• Convergence is defined as the specific situation in probability based on intuition.
Steps in EM algorithm:
Step 1: Initialize the parameter values. Further the system is provided with
incomplete observed data with assumption.
Step 2: This step is known as E-step. It is used to estimate the values of missing
or incomplete data using the observed data.
Step 3: This step is known as M-step. It use complete data obtained from 2nd step
to update the parameter values.
Step 4: The last step is to check if the values of latent variables are converging or
not. If it gets “yes”, then stop the process; else repeat the process from step 2 until
the convergence occurs.

Applications:

• Data clustering
• Used in Computer vision and NLP
• Used in medical and healthcare industry

Advantages:

• Easy to implement
• It often generates a solution for M-step in closed form

Disadvantages:

• Convergence of EM algorithm is very slow.


• It takes both forward and backward probability

You might also like