Unit IV ML
Unit IV ML
Machine Learning
Introduction:-
Machine learning (ML) is a type of artificial intelligence (AI) focused on building computer systems that
learn from data. The broad range of techniques ML encompasses enables software applications to
improve their performance over time.
While machine learning is a powerful tool for solving problems, improving business operations and
automating tasks, it's also a complex and challenging technology, requiring deep expertise and
significant resources.
Based on the methods and way of learning, machine learning is divided into mainly four types, which
are:
As its name suggests, Supervised machine learning is based on supervision. It means in the supervised
learning technique, we train the machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that some of the inputs are already
mapped to the output. More preciously, we can say; first, we train the machine with the input and
corresponding output, and then we ask the machine to predict the output using the test dataset.
Advantages:
Since supervised learning work with the labelled dataset so we can have an exact idea about the
classes of objects.
These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
In unsupervised learning, the models are trained with the data that is neither classified nor labelled, and
the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden
patterns from the input dataset.
These algorithms can be used for complicated tasks compared to the supervised ones because
these algorithms work on the unlabeled dataset.
Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is
easier as compared to the labelled dataset.
Disadvantages:
The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that
does not map with the output.
3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled training data) algorithms and uses
the combination of labelled and unlabeled datasets during the training period.
Disadvantages:
In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only.
Disadvantage
RL algorithms are not preferred for simple problems.
RL algorithms require huge data and computations.
Too much reinforcement learning can lead to an overload of states which can weaken the
results.
The curse of dimensionality limits reinforcement learning for real physical systems.
Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion:
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's a
popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech
to text", or "Computer speech recognition." At present, machine learning algorithms are widely used by
various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.
3.Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
Real Time location of the vehicle form Google Map app and sensors
Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on
self-driving car. It is using unsupervised learning method to train the car models to detect people and
objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our spam
box, and the technology behind this is Machine learning. Below are some spam filters used by Gmail:
Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us
in various ways just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
9. Stock Market trading:
Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up
and downs in shares, so for this machine learning's long short term memory neural network is used for
the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand.
The logic behind the decision tree can be easily understood because it shows a tree-like structure.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute
and, based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called the
final node as a leaf node.
You might have heard technical terms such as supervised, unsupervised, and semi-supervised learning–
they all rely on a solid statistical foundation.
Essentially, statistics provides the theoretical framework upon which machine learning algorithms are
built.
Statistics is the science that allows us to collect, analyze, interpret, present, and organize data. It
provides a robust set of tools for understanding patterns and trends, and making inferences and
predictions based on data. When we're dealing with large datasets, statistics helps us understand and
summarize the data, allowing us to make sense of complex phenomena.
Machine learning, on the other hand, is a powerful tool that allows computers to learn from and make
decisions or predictions based on data. The ultimate goal of machine learning is to create models that
can adapt and improve over time, as well as generalize from specific examples to broader cases.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of
the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste,
then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of
a hypothesis with prior knowledge. It depends on the conditional probability.
What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of various unsupervised
machine learning algorithms, which is used to determine the local maximum likelihood estimates
(MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models.
Further, it is a technique to find maximum likelihood estimation when the latent variables are present. It
is also referred to as the latent variable model.
A latent variable model consists of both observable and unobservable variables where observable can
be predicted while unobserved are inferred from the observed variable. These unobservable variables
are known as latent variables.
EM Algorithm
The EM algorithm is the combination of various unsupervised ML algorithms, such as the k-means
clustering algorithm. Being an iterative approach, it consists of two modes. In the first mode, we
estimate the missing or latent variables. Hence it is referred to as the Expectation/estimation step (E-
step). Further, the other mode is used to optimize the parameters of the models so that it can explain
the data more clearly. The second mode is known as the maximization-step or M-step.
Expectation step (E - step): It involves the estimation (guess) of all missing values in the dataset so that
after completing this step, there should not be any missing value.
Maximization step (M - step): This step involves the use of estimated data in the E-step and updating
the parameters.
Repeat E-step and M-step until the convergence of the values occurs.
The primary goal of the EM algorithm is to use the available observed data of the dataset to estimate
the missing data of the latent variables and then use that data to update the values of the parameters in
the M-step.
Applications of EM algorithm
The primary aim of the EM algorithm is to estimate the missing data in the latent variables through
observed data in datasets. The EM algorithm or latent variable model has a broad range of real-life
applications in machine learning.
Advantages of EM algorithm
It is very easy to implement the first two basic steps of the EM algorithm in various machine
learning problems, which are E-step and M- step.
It is mostly guaranteed that likelihood will enhance after each iteration.
It often generates a solution for the M-step in the closed form.
Disadvantages of EM algorithm
The convergence of the EM algorithm is very slow.
It can make convergence for the local optima only.
It takes both forward and backward probability into consideration. It is opposite to that of
numerical optimization, which takes only forward probabilities.
We model an environment after the problem statement. The model interacts with this environment and
comes up with solutions all on its own, without human interference. To push it in the right direction, we
simply give it a positive reward if it performs an action that brings it closer to its goal or a negative
reward if it goes away from its goal.
To understand reinforcement learning better, consider a dog that we have to house train. Here, the dog
is the agent and the house, the environment.
Supervised vs Unsupervised vs Reinforcement Learning
The below table shows the differences between the three main sub-branches of machine learning.
Environment: The training situation that the model must optimize to is called its environment
Reward: To help the model move in the right direction, it is rewarded/points are given to it to appraise
some action
Policy: Policy determines how an agent will behave at any time. It acts as a mapping between Action and
present State.