0% found this document useful (0 votes)
5 views10 pages

Unit IV ML

Machine learning (ML) is a subset of artificial intelligence that enables systems to learn from data and improve over time, categorized into supervised, unsupervised, semi-supervised, and reinforcement learning. Each type has its advantages and disadvantages, with applications ranging from image and speech recognition to self-driving cars and fraud detection. The document also discusses specific algorithms like Decision Trees and Naïve Bayes, highlighting their functionalities and use cases in various fields.

Uploaded by

Akshay Teotia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Unit IV ML

Machine learning (ML) is a subset of artificial intelligence that enables systems to learn from data and improve over time, categorized into supervised, unsupervised, semi-supervised, and reinforcement learning. Each type has its advantages and disadvantages, with applications ranging from image and speech recognition to self-driving cars and fraud detection. The document also discusses specific algorithms like Decision Trees and Naïve Bayes, highlighting their functionalities and use cases in various fields.

Uploaded by

Akshay Teotia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit IV

Machine Learning
Introduction:-
Machine learning (ML) is a type of artificial intelligence (AI) focused on building computer systems that
learn from data. The broad range of techniques ML encompasses enables software applications to
improve their performance over time.

While machine learning is a powerful tool for solving problems, improving business operations and
automating tasks, it's also a complex and challenging technology, requiring deep expertise and
significant resources.

Types and applications areas


These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types, which
are:

 Supervised Machine Learning


 Unsupervised Machine Learning
 Semi-Supervised Machine Learning
 Reinforcement Learning

1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the supervised
learning technique, we train the machines using the "labelled" dataset, and based on the training, the
machine predicts the output. Here, the labelled data specifies that some of the inputs are already
mapped to the output. More preciously, we can say; first, we train the machine with the input and
corresponding output, and then we ask the machine to predict the output using the test dataset.

Advantages and Disadvantages of Supervised Learning

Advantages:

 Since supervised learning work with the labelled dataset so we can have an exact idea about the
classes of objects.
 These algorithms are helpful in predicting the output on the basis of prior experience.

Disadvantages:

 These algorithms are not able to solve complex tasks.


 It may predict the wrong output if the test data is different from the training data.
 It requires lots of computational time to train the algorithm.
2. Unsupervised Machine Learning
Unsupervised learning is different from the Supervised learning technique; as its name suggests, there is
no need for supervision. It means, in unsupervised machine learning, the machine is trained using the
unlabeled dataset, and the machine predicts the output without any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor labelled, and
the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden
patterns from the input dataset.

Advantages and Disadvantages of Unsupervised Learning Algorithm


Advantages:

 These algorithms can be used for complicated tasks compared to the supervised ones because
these algorithms work on the unlabeled dataset.
 Unsupervised algorithms are preferable for various tasks as getting the unlabeled dataset is
easier as compared to the labelled dataset.

Disadvantages:
 The output of an unsupervised algorithm can be less accurate as the dataset is not labelled, and
algorithms are not trained with the exact output in prior.
 Working with Unsupervised learning is more difficult as it works with the unlabelled dataset that
does not map with the output.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled training data) algorithms and uses
the combination of labelled and unlabeled datasets during the training period.

Advantages and disadvantages of Semi-supervised Learning


Advantages:

 It is simple and easy to understand the algorithm.


 It is highly efficient.
 It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:

 Iterations results may not be stable.


 We cannot apply these algorithms to network-level data.
 Accuracy is low.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get
punished for each bad action; hence the goal of reinforcement learning agent is to maximize the
rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn from their
experiences only.

Advantages and Disadvantages of Reinforcement Learning


Advantages
 It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
 The learning model of RL is similar to the learning of human beings; hence most accurate results
can be found.
 Helps in achieving long term results.

Disadvantage
 RL algorithms are not preferred for simple problems.
 RL algorithms require huge data and computations.
 Too much reinforcement learning can lead to an overload of states which can weaken the
results.
 The curse of dimensionality limits reinforcement learning for real physical systems.

Applications of Machine learning


1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to identify
objects, persons, places, digital images, etc. The popular use case of image recognition and face
detection is, Automatic friend tagging suggestion:

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition, and it's a
popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also known as "Speech
to text", or "Computer speech recognition." At present, machine learning algorithms are widely used by
various applications of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

3.Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path with the
shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily congested
with the help of two ways:
 Real Time location of the vehicle form Google Map app and sensors

Average time has taken on past days at the same time.

 Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine learning plays a
significant role in self-driving cars. Tesla, the most popular car manufacturing company is working on
self-driving car. It is using unsupervised learning method to train the car models to detect people and
objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our spam
box, and the technology behind this is Machine learning. Below are some spam filters used by Gmail:

 Content Filter
 Header filter
 General blacklists filter
 Rules-based filters
 Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve Bayes
classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the name
suggests, they help us in finding the information using our voice instruction. These assistants can help us
in various ways just by our voice instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a genuine
transaction or a fraud transaction.
9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is always a risk of up
and downs in shares, so for this machine learning's long short term memory neural network is used for
the prediction of stock market trends.

10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this, medical technology is
growing very fast and able to build 3D models that can predict the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

Decision Tree Classification Algorithm


Decision Tree is a supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.

In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.

In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.

A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.

Why use Decision Trees?


There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset
and problem is the main point to remember while creating a machine learning model. Below are the two
reasons for using the Decision tree:

Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand.

The logic behind the decision tree can be easily understood because it shows a tree-like structure.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node
of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute
and, based on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree.

The complete process can be better understood using the below algorithm:

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

Step-3: Divide the S into subsets that contains possible values for the best attributes.

Step-4: Generate the decision tree node, which contains the best attribute.

Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called the
final node as a leaf node.

Advantages of the Decision Tree


 It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


 The decision tree contains lots of layers, which makes it complex.
 It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
 For more class labels, the computational complexity of the decision tree may increase.

What is Statistical Machine Learning?


As intuitive as it sounds from its name, statistical machine learning involves using statistical techniques
to develop models that can learn from data and make predictions or decisions.

You might have heard technical terms such as supervised, unsupervised, and semi-supervised learning–
they all rely on a solid statistical foundation.

The Role of Statistics in Machine Learning


Statistics constitutes the backbone of machine learning, providing the tools and techniques to analyze
and interpret data.

Essentially, statistics provides the theoretical framework upon which machine learning algorithms are
built.
Statistics is the science that allows us to collect, analyze, interpret, present, and organize data. It
provides a robust set of tools for understanding patterns and trends, and making inferences and
predictions based on data. When we're dealing with large datasets, statistics helps us understand and
summarize the data, allowing us to make sense of complex phenomena.

Machine learning, on the other hand, is a powerful tool that allows computers to learn from and make
decisions or predictions based on data. The ultimate goal of machine learning is to create models that
can adapt and improve over time, as well as generalize from specific examples to broader cases.

Naïve Bayes Classifier Algorithm


Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.

It is mainly used in text classification that includes a high-dimensional training dataset.

Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.

It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.

Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of
the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste,
then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.

Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of
a hypothesis with prior knowledge. It depends on the conditional probability.

Advantages of Naïve Bayes Classifier:


 Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
 It can be used for Binary as well as Multi-class Classifications.
 It performs well in Multi-class predictions as compared to the other Algorithms.
 It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship
between features.
Applications of Naïve Bayes Classifier:
 It is used for Credit Scoring.
 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment analysis.

EM Algorithm in Machine Learning


The EM algorithm is considered a latent variable model to find the local maximum likelihood parameters
of a statistical model, proposed by Arthur Dempster, Nan Laird, and Donald Rubin in 1977. The EM
(Expectation-Maximization) algorithm is one of the most commonly used terms in machine learning to
obtain maximum likelihood estimates of variables that are sometimes observable and sometimes not.
However, it is also applicable to unobserved data or sometimes called latent. It has various real-world
applications in statistics, including obtaining the mode of the posterior marginal distribution of
parameters in machine learning and data mining applications.

What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined as the combination of various unsupervised
machine learning algorithms, which is used to determine the local maximum likelihood estimates
(MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models.
Further, it is a technique to find maximum likelihood estimation when the latent variables are present. It
is also referred to as the latent variable model.

A latent variable model consists of both observable and unobservable variables where observable can
be predicted while unobserved are inferred from the observed variable. These unobservable variables
are known as latent variables.

EM Algorithm
The EM algorithm is the combination of various unsupervised ML algorithms, such as the k-means
clustering algorithm. Being an iterative approach, it consists of two modes. In the first mode, we
estimate the missing or latent variables. Hence it is referred to as the Expectation/estimation step (E-
step). Further, the other mode is used to optimize the parameters of the models so that it can explain
the data more clearly. The second mode is known as the maximization-step or M-step.

Expectation step (E - step): It involves the estimation (guess) of all missing values in the dataset so that
after completing this step, there should not be any missing value.

Maximization step (M - step): This step involves the use of estimated data in the E-step and updating
the parameters.

Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of the dataset to estimate
the missing data of the latent variables and then use that data to update the values of the parameters in
the M-step.
Applications of EM algorithm
The primary aim of the EM algorithm is to estimate the missing data in the latent variables through
observed data in datasets. The EM algorithm or latent variable model has a broad range of real-life
applications in machine learning.

These are as follows:

 The EM algorithm is applicable in data clustering in machine learning.


 It is often used in computer vision and NLP (Natural language processing).
 It is used to estimate the value of the parameter in mixed models such as the Gaussian Mixture
Modeland quantitative genetics.
 It is also used in psychometrics for estimating item parameters and latent abilities of item
response theory models.
 It is also applicable in the medical and healthcare industry, such as in image reconstruction and
structural engineering.
 It is used to determine the Gaussian density of a function.

Advantages of EM algorithm
 It is very easy to implement the first two basic steps of the EM algorithm in various machine
learning problems, which are E-step and M- step.
 It is mostly guaranteed that likelihood will enhance after each iteration.
 It often generates a solution for the M-step in the closed form.

Disadvantages of EM algorithm
 The convergence of the EM algorithm is very slow.
 It can make convergence for the local optima only.
 It takes both forward and backward probability into consideration. It is opposite to that of
numerical optimization, which takes only forward probabilities.

What is Reinforcement Learning?


Reinforcement learning is a sub-branch of Machine Learning that trains a model to return an optimum
solution for a problem by taking a sequence of decisions by itself.

We model an environment after the problem statement. The model interacts with this environment and
comes up with solutions all on its own, without human interference. To push it in the right direction, we
simply give it a positive reward if it performs an action that brings it closer to its goal or a negative
reward if it goes away from its goal.

To understand reinforcement learning better, consider a dog that we have to house train. Here, the dog
is the agent and the house, the environment.
Supervised vs Unsupervised vs Reinforcement Learning
The below table shows the differences between the three main sub-branches of machine learning.

Table 1: Differences between Supervised, Unsupervised, and Reinforcement Learning

Important Terms in Reinforcement Learning


Agent: Agent is the model that is being trained via reinforcement learning

Environment: The training situation that the model must optimize to is called its environment

Action: All possible steps that can be taken by the model

State: The current position/ condition returned by the model

Reward: To help the model move in the right direction, it is rewarded/points are given to it to appraise
some action

Policy: Policy determines how an agent will behave at any time. It acts as a mapping between Action and
present State.

You might also like