AI & ML Unit 4 Notes
AI & ML Unit 4 Notes
Ensemble Methods:
• Max Voting
• Averaging
• Weighted Averaging
Max-Voting:
• Similar to max-voting technique, multiple predictions are made for each data point
in averaging.
• In this method, the average is taken of predictions from all the models and use it to
make final prediction.
• It can be used for making predictions in regression problems.
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating
5 4 5 4 4 4.4
Weighted Average:
Ensemble Learning:
Bagging:
• The idea behind bagging is combining the results of multiple models to get a
generalized result.
• Bagging, the short form for bootstrap aggregating, is mainly applied in
classification and regression.
• It increases the accuracy of models through decision trees.
• It is mainly applied in supervised learning problems such as classification and
regression.
• It involves two steps, i.e., bootstrapping and aggregation,
• Bootstrapping: It is a random sampling technique in which samples are derived
from the data using the replacement procedure.
• Aggregation: It is in bagging is done to incorporate all possible outcomes of the
prediction and randomize the outcome.
Advantages:
✓ Eliminates variance
✓ Weak base learners are combined to form single strong learner
Boosting:
Step 8: Similarly, multiple models are created, each correcting the errors
Step 9: The final model is the weighted mean of all models
Step 10: Thus boosting algorithm combines a number of weak learners to form a
strong learner.
Stacking:
Step 2: A base model is fitted on 9 parts and predictions are made for 10th part.
Step 3: The base model is then fitted on the whole train dataset. Using this model,
predictions are make on the test set.
Step 4: Steps 2 to 4 are repeated for another base model resulting in another set
of predictions for train set and test set.
Step 5: The predictions from the train set are used as features to build a new
model.
Unsupervised Learning:
Association:
Clustering:
• K-means clustering
• KNN(K-nearest neighbor)
• Gaussian Mixture model
• Expectation Maximization
Association:
• Apriori algorithm
• FP growth algorithm
Applications:
Advantages:
Disadvantages:
Working:
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which mean reassign each datapoint to the new closest
centroid of each cluster.
Applications:
• Academic performance
• Diagnostic systems
• Search engines
Advantages:
• Simple
• Easy to implement
Working:
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points.
Advantages:
• Simple to implement
• Robust
• More effective to train data
Disadvantage:
• These systems that learn the training examples by heart and then generalizes to
new instances based on some similarity measure.
• It builds the hypotheses from the training instances.
• It is also known as Memory-based learning or lazy learning.
Advantages:
• Flexibility
• Robustness
• Speed
Disadvantages:
• Sensitivity to initialization
• High dimensional data.
Expectation Maximization:
• The primary goal of the EM algorithm is to use the available observed data of the
dataset to estimate the missing data.
• Convergence is defined as the specific situation in probability based on intuition.
Steps in EM algorithm:
Step 1: Initialize the parameter values. Further the system is provided with
incomplete observed data with assumption.
Step 2: This step is known as E-step. It is used to estimate the values of missing
or incomplete data using the observed data.
Step 3: This step is known as M-step. It use complete data obtained from 2nd step
to update the parameter values.
Step 4: The last step is to check if the values of latent variables are converging or
not. If it gets “yes”, then stop the process; else repeat the process from step 2 until
the convergence occurs.
Applications:
• Data clustering
• Used in Computer vision and NLP
• Used in medical and healthcare industry
Advantages:
• Easy to implement
• It often generates a solution for M-step in closed form
Disadvantages: