0% found this document useful (0 votes)
178 views115 pages

ML Complete Notes-AIDS

AIDS

Uploaded by

chandranaiik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views115 pages

ML Complete Notes-AIDS

AIDS

Uploaded by

chandranaiik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Code Category L T P C I.M E.

M Exam
B20AD3201 PC 3 -- -- 3 30 70 3 Hrs.

MACHINE LEARNING
(For AI&DS)
Course Objectives:
Identify problems that are amenable to solution by ANN methods, and which ML methods
1
may be suited to solving a given problem.
Formalize a given problem in the language/framework of different ANN methods (e.g., as a
2 search problem, as a constraint satisfaction problem, as a planning problem, as a Markov
decision process, etc).

Course Outcomes: At the end of this course, the students will be able to
Knowledge
S.No Outcome
Level
1 Explain the fundamental usage of the concept Machine Learning system K2
2 Demonstrate on various regression, classification techniques K3
3 Analyze the Ensemble Learning Methods K4
Illustrate the Clustering Techniques and Dimensionality Reduction Models
4 K3
in Machine Learning.
Discuss the Neural Network Models and Fundamentals concepts of Deep
5 K2
Learning

SYLLABUS
Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of
Machine Learning Systems, Main Challenges of Machine Learning.
UNIT-I
Statistical Learning: Introduction, Supervised and Unsupervised Learning,
(12Hrs)
Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk Statistics,
Sampling distribution of an estimator, Empirical Risk Minimization.

Supervised Learning(Regression/Classification):Basic Methods: Distance based


UNIT-II Methods, Nearest Neighbours, Decision Trees, Naive Bayes, Linear Models:
(10 Hrs) Linear Regression, Logistic Regression, Generalized Linear Models, Support Vector
Machines, Binary Classification: Multiclass/Structured outputs, MNIST, Ranking.

Ensemble Learning and Random Forests: Introduction, Voting Classifiers,


UNIT-III Bagging and Pasting, Random Forests, Boosting, Stacking.
(10 Hrs) Support Vector Machine: Linear SVM Classification, Nonlinear SVM
Classification SVM Regression, Naïve Bayes Classifiers.

Page 29 of 66
Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means,
Using Clustering for Image Segmentation, Using Clustering for Pre processing,
UNIT-IV Using Clustering for Semi-Supervised Learning, DBSCAN, Gaussian Mixtures.
(8 Hrs) Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for
Dimensionality Reduction, PCA, Using Scikit-Learn, Randomized PCA, Kernel
PCA.

Neural Networks and Deep Learning: Introduction to Artificial Neural Networks


UNIT-V
with Keras, Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and
(10Hrs)
Preprocessing Data with Tensor Flow.

Text Books:
Hands-On Machine Learning with Scikit-Learn, Keras, and Tensor Flow, 2nd Edition,
1.
19
Data Science and Machine Learning Mathematical and Statistical Methods, Dirk P. Kroese,
2.
Zdravko I. Botev, Thomas Taimre, Radislav Vaisman,25th November 2020
Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Page 30 of 66
UNIT-I
Artificial Intelligence
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are
programmed to think and act like humans. It involves the development of algorithms and computer
programs that can perform tasks that typically require human intelligence such as visual perception,
speech recognition, decision-making, and language translation. AI has the potential to revolutionize
many industries and has a wide range of applications, from virtual personal assistants to self-driving
cars. Before leading to the meaning of artificial intelligence let understand what the meaning of
Intelligence- Intelligence is: The ability to learn and solve problems. This definition is taken from
webster’s Dictionary.
Uses of Artificial Intelligence :
• Healthcare: AI is used for medical diagnosis, drug discovery, and predictive analysis of diseases.
• Finance: AI helps in credit scoring, fraud detection, and financial forecasting.
• Retail: AI is used for product recommendations, price optimization, and supply chain management.
• Manufacturing: AI helps in quality control, predictive maintenance, and production optimization.
• Transportation: AI is used for autonomous vehicles, traffic prediction, and route optimization.
• Customer service: AI-powered chatbots are used for customer support, answering frequently asked
questions, and handling simple requests.
• Security: AI is used for facial recognition, intrusion detection, and cybersecurity threat analysis.
• Marketing: AI is used for targeted advertising, customer segmentation, and sentiment analysis.
• Education: AI is used for personalized learning, adaptive testing, and intelligent tutoring systems.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Machine Learning

✓ Machine Learning is an algorithm. That has ability to learn from past experience.
✓ Machine learning combines data with statistical tools to predict an output. This output is then
used by corporate to makes actionable insights.
✓ Machine learning is closely related to data mining and Bayesian predictive modeling. The
machine receives data as input, use an algorithm to formulate answers.
✓ A typical machine learning tasks are to provide a recommendation. For those who have a
Netflix account, all recommendations of movies or series are based on the user‘s historical data.
✓ Machine learning is also used for a variety of task like fraud detection, predictive
maintenance, portfolio optimization task and so on.
✓ Machine learning is only one functionality and we can use different programs.

Applications of machine learning


• Image Recognition
• Speech Recognition
• Traffic prediction
• Product recommendations
• Email Spam
• Online Fraud Detection
• Stock Market trading
• Medical Diagnosis

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Machine Learning Systems

Supervised learning
Type of machine learning in which machine are trained using well labeled training data and machine
predict the output. Labeled data means some input data is already tagged with the correct output.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Supervised learning

Classification
✓ Classification is a supervised learning
✓ Classification is a categorical variable
✓ Help you divide your data into different classes and the algorithm which implements the
classification on a dataset is known as a classifier.
✓ There are two types of classifications
1) Binary classification: if the classification problem has only two possible classes is calledbinary
classification(T/F,Y/N,0,1)
2) Multi class classification: if the classification program has more than two classes iscalled
multi class classification(Movies, Music)

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Classification Algorithms
✓ Knn
✓ Naïve bayes
✓ Decision tree
✓ Logistic regression
✓ Support vector machine
Regression
✓ Regression algorithm is used if there is a relation between dependent and independent
variable or input and output variable is called regression.
✓ Regression it is used for the prediction of continuous variable such as a weather,forecasting,
market trends etc.

Types of Regression Algorithms

✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression
Unsupervised Learning

Unsupervised learning is a type of algorithm that learns patterns from untagged data. It mainly deal
with the unlabelled data Unsupervised learning algorithm allows users to perform more complex
processing task compared to supervised learning.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Clustering

Clustering is a unsupervised learning. There is not any label for each instance of data. Clustering is
alternatively called as grouping Clustering is the task of grouping a set of objects in such a way that
objects in the same group are more similar to each other than tothose in other group.

Types of clustering algorithms

✓ Exclusive cluster
✓ Overlap cluster
✓ Hierarchical
Reinforcement Learning

Reinforcement learning is an important type of machine learning where an agent learns how to behave
in an environment by performing actions and seeing the results. Reinforcement learning is learning
from mistakes at the beginning stage. Reinforcement learning is a relationship between supervised and
unsupervised learning

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Deep learning
Deep learning is a branch of machine learning which is based on artificial neural networks. It is capable
of learning complex patterns and relationships within data. In deep learning, we don’t need to explicitly
program everything. It has become increasingly popular in recent years due to the advances in
processing power and the availability of large datasets. Because it is based on artificial neural networks
(ANNs) also known as deep neural networks (DNNs). These neural networks are inspired by the
structure and function of the human brain’s biological neurons, and they are designed to learn from
large amounts of data.
Deep Learning is a subfield of Machine Learning that involves the use of neural networks to model and
solve complex problems. Neural networks are modeled after the structure and function of the human
brain and consist of layers of interconnected nodes that process and transform data.
The key characteristic of Deep Learning is the use of deep neural networks, which have multiple layers
of interconnected nodes. These networks can learn complex representations of data by discovering
hierarchical patterns and features in the data. Deep Learning algorithms can automatically learn and
improve from data without the need for manual feature engineering.
Deep Learning has achieved significant success in various fields, including image recognition, natural
language processing, speech recognition, and recommendation systems. Some of the popular Deep
Learning architectures include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
Training deep neural networks typically requires a large amount of data and computational resources.
However, the availability of cloud computing and the development of specialized hardware, such as
Graphics Processing Units (GPUs), has made it easier to train deep neural networks

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Main Challenges of Machine Learning.
In Machine Learning, there occurs a process of analyzing data for building or training models. It is
just everywhere; from Amazon product recommendations to self-driven cars, it beholds great value
throughout. As per the latest research, the global machine learning market is expected to grow by
43% by 2024. This revolution has enhanced the demand for machine learning professionals to a great
extent. AI and machine learning jobs have observed a significant growth rate of 75% in the past four
years, and the industry is growing continuously. A career in the Machine learning domain offers job
satisfaction, excellent growth, insanely high salary, but it is a complex and challenging process.

1. Poor Quality of Data

Data plays a significant role in the machine learning process. One of the significant issues that
machine learning professionals face is the absence of good quality data. Unclean and noisy data can
make the whole process extremely exhausting. We don’t want our algorithm to make inaccurate or
faulty predictions. Hence the quality of data is essential to enhance the output. Therefore, we need to
ensure that the process of data preprocessing which includes removing outliers, filtering missing
values, and removing unwanted features, is done with the utmost level of perfection.

2. Underfitting of Training Data

This process occurs when data is unable to establish an accurate relationship between input and output
variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to
establish a precise relationship. To overcome this issue:

• Maximize the training time


• Enhance the complexity of the model
• Add more features to the data
• Reduce regular parameters
• Increasing the training time of model

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3. Overfitting of Training Data

Overfitting refers to a machine learning model trained with a massive amount of data that negatively
affect its performance. It is like trying to fit in Oversized jeans. Unfortunately, this is one of the
significant issues faced by machine learning professionals. This means that the algorithm is trained
with noisy and biased data, which will affect its overall performance. Let’s understand this with the
help of an example. Let’s consider a model trained to differentiate between a cat, a rabbit, a dog, and
a tiger. The training data contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is
a considerable probability that it will identify the cat as a rabbit. In this example, we had a vast amount
of data, but it was biased; hence the prediction was negatively affected.

We can tackle this issue by:

• Analyzing the data with the utmost level of perfection


• Use data augmentation technique
• Remove outliers in the training set
• Select a model with lesser features

4. Machine Learning is a Complex Process

The machine learning industry is young and is continuously changing. Rapid hit and trial experiments
are being carried on. The process is transforming, and hence there are high chances of error which
makes the learning complex. It includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, and a lot more. Hence it is a really complicated process
which is another big challenge for Machine learning professionals.

5. Lack of Training Data

The most important task you need to do in the machine learning process is to train the data to achieve
an accurate output. Less amount training data will produce inaccurate or too biased predictions. Let
us understand this with the help of an example. Consider a machine learning algorithm similar to
training a child. One day you decided to explain to a child how to distinguish between an apple and
a watermelon. You will take an apple and a watermelon and show him the difference between both
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
based on their color, shape, and taste. In this way, soon, he will attain perfection in differentiating
between the two. But on the other hand, a machine-learning algorithm needs a lot of data to
distinguish. For complex problems, it may even require millions of data to be trained. Therefore we
need to ensure that Machine learning algorithms are trained with sufficient amounts of data.

6. Slow Implementation
This is one of the common issues faced by machine learning professionals. The machine learning
models are highly efficient in providing accurate results, but it takes a tremendous amount of time.
Slow programs, data overload, and excessive requirements usually take a lot of time to provide
accurate results. Further, it requires constant monitoring and maintenance to deliver the best output.

7. Imperfections in the Algorithm When Data Grows

So you have found quality data, trained it amazingly, and the predictions are really concise and
accurate. Yay, you have learned how to create a machine learning algorithm!! But wait, there is a
twist; the model may become useless in the future as data grows. The best model of the present may
become inaccurate in the coming Future and require further rearrangement. So you need regular
monitoring and maintenance to keep the algorithm working. This is one of the most exhausting issues
faced by machine learning professionals.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Statistical Learning: Introduction
Statistical learning is a branch of machine learning that focuses on using statistical methods to extract
knowledge from data and build predictive models. It's all about learning from past observations to make
accurate forecasts about the future.

Key concepts:

• Supervised learning: Here, the data comes with labels (e.g., customer bought a shirt, didn't buy a
shirt). You train the model to learn the relationship between features (e.g., age, browsing history) and
labels, enabling it to predict future labels for unseen data.
• Unsupervised learning: No labels? No problem! This method identifies inherent patterns in unlabeled
data. Imagine analyzing customer reviews to uncover hidden segments or group products with similar
features.
• Regularization: Prevents overfitting, where the model memorizes the training data but fails to
generalize to new situations. Think of it as adding training wheels to your model to prevent it from
going too wild.
• Model selection: With various models at your disposal, how do you choose the best one? This
involves comparing their performance on unseen data and picking the champion.

Applications:

Statistical learning is everywhere! From personalized search results and targeted advertising to medical
diagnosis and financial forecasting, it's transforming countless industries. Here are some specific
examples:

• Recommender systems: Suggesting movies you'll love, recommending books you can't put down, and
even predicting what you'll buy next at the grocery store.
• Spam filtering: Keeping your inbox clean by identifying and eliminating unwanted emails.
• Fraud detection: Analyzing financial transactions to catch suspicious activity and protect your hard-
earned money.
• Medical diagnosis: Identifying patterns in medical images and data to help doctors diagnose diseases
more accurately.
• Climate prediction: Analyzing historical data and complex models to forecast future weather patterns
and climate change.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Supervised and Unsupervised Learning

Supervised learning
Type of machine learning in which machine are trained using well labeled training data and machine
predict the output. Labeled data means some input data is already tagged with the correct output.

Types of Supervised learning

Classification
✓ Classification is a supervised learning
✓ Classification is a categorical variable
✓ Help you divide your data into different classes and the algorithm which implements the
classification on a dataset is known as a classifier.
✓ There are two types of classifications

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3) Binary classification: if the classification problem has only two possible classes is calledbinary
classification(T/F,Y/N,0,1)
4) Multi class classification: if the classification program has more than two classes iscalled
multi class classification(Movies, Music)

Types of Classification Algorithms


✓ Knn
✓ Naïve bayes
✓ Decision tree
✓ Logistic regression
✓ Support vector machine
Regression

✓ Regression algorithm is used if there is a relation between dependent and independent


variable or input and output variable is called regression.
✓ Regression it is used for the prediction of continuous variable such as a weather,forecasting,
market trends etc.

Types of Regression Algorithms

✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Unsupervised Learning

Unsupervised learning is a type of algorithm that learns patterns from untagged data.
It mainly deal with the unlabelled data
Unsupervised learning algorithm allows users to perform more complex processing task
compared to supervised learning.

Clustering

Clustering is a unsupervised learning


There is not any label for each instance of data.
Clustering is alternatively called as grouping
Clustering is the task of grouping a set of objects in such a way that objects in the samegroup
are more similar to each other than to those in other group.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of clustering algorithms

Exclusive cluster
Overlap cluster
Hierarchical

Exclusive (partitioning)
In this clustering method, Data are grouped in such a way that one data can belong to one clusteronly.
Example: K-means

Agglomerative
In this clustering technique, every data is a cluster. The iterative unions between the two nearest
clusters reduce the number of clusters.

Example: Hierarchical clustering

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Overlapping

In this technique, fuzzy sets are used to cluster data. Each point may belong to two or more
clusters with separate degrees of membership.

Training and Test Loss


Training and test loss are two essential metrics in machine learning that provide valuable insights
into the performance of your model. Let's break them down:
Training Loss:
• Imagine you're training a dog to fetch a ball. Every time the dog brings you the ball, you give it a
treat (positive reinforcement). Similarly, in machine learning, the training loss is a measure of how
well your model performs on the training data.
• As the model learns from the data and adjusts its parameters (e.g., the dog figuring out the best
way to catch the ball), the training loss should decrease. This indicates that the model is getting
better at making accurate predictions on the data it has already seen.

Test Loss:
• Now, imagine taking your trained dog to a park with lots of distractions – squirrels, frisbees, other
dogs. Will it still fetch your ball? The test loss is like taking your model to this "unseen" park (test
data). It assesses how well your model performs on data it hasn't seen before.
• Ideally, the test loss should be similar to, or even lower than, the training loss. This suggests that

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
your model hasn't just memorized the training data, but has truly learned the underlying patterns
and can generalize well to new situations.

Tradeoffs in Statistical Learning


In the pursuit of effective and efficient machine learning models, we encounter various trade-offs
that require careful consideration. These trade-offs often involve balancing competing objectives or
limitations inherent in the data and learning algorithms. Understanding and managing these trade-
offs is crucial for building robust and generalizable models. Here are some key trade-offs in
statistical learning and machine learning:
1. Bias-Variance Tradeoff:
• Bias: This refers to the systematic error between the model's predictions and the true target
values. A high bias model tends to underfit the data, meaning it cannot capture the underlying
patterns effectively.
• Variance: This measures the variability of the model's predictions across different samples of
data. A high variance model tends to overfit the data, meaning it captures the noise in the data along
with the actual patterns.
• Trade-off: The goal is to find a balance between bias and variance. A model with too low bias
might overfit and have poor generalization performance, while a model with too high bias might
underfit and miss important patterns.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Model Complexity vs. Generalizability:
• Model complexity: More complex models with a larger number of parameters can potentially
capture more complex patterns in the data.
• Generalizability: However, complex models are more prone to overfitting and may not generalize
well to unseen data.
• Trade-off: There is a trade-off between model complexity and generalizability. The ideal model
should be sufficiently complex to capture the relevant patterns in the data but not so complex that it
overfits and loses its ability to generalize.
3. Regularization vs. Flexibility:
• Regularization: This is a technique used to penalize complex models and encourage simpler
models that are less prone to overfitting.
• Flexibility: Regularization can also reduce the model's ability to capture complex patterns in the
data.
• Trade-off: There is a trade-off between regularization and flexibility. The chosen regularization
strength should be balanced to achieve the desired level of bias-variance trade-off and minimize
overfitting.
4. Computational Efficiency vs. Accuracy:
• Computational efficiency: Some learning algorithms are computationally expensive and may
require significant resources to train.
• Accuracy: More complex and sophisticated algorithms might achieve higher accuracy but at the
cost of increased computational demands.
• Trade-off: Depending on the available resources and the specific task, there may be a trade-off
between computational efficiency and accuracy. Sometimes, a simpler model with slightly lower
accuracy might be more practical due to its computational efficiency.
5. Data Quantity vs. Model Performance:
• Data quantity: More data can potentially lead to better model performance as the learning
algorithm has more information to learn from.
• Limited data: However, acquiring and managing large amounts of data can be expensive and time-
consuming.
• Trade-off: There is a trade-off between data quantity and model performance. In some
cases, techniques like data augmentation or transfer learning can be used to mitigate the need for
large datasets.
6. Interpretability vs. Black Box Models:
• Interpretability: Some models are more easily interpretable than others, meaning it is easier to
understand how they reach their predictions.
• Black box models: Highly complex models can be difficult to interpret, making it challenging to
understand their decision-making process.
• Trade-off: There is a trade-off between interpretability and model performance. While
interpretable models offer more transparency, they might not always achieve the best
performance. The choice between interpretability and performance depends on the specific
application and the importance of understanding the model's rationale.
ESTIMATING RISK STATISTICS
Estimating risk statistics in machine learning involves calculating various metrics that provide
insights into the performance of a model and its ability to generalize to unseen data. These metrics
are crucial for evaluating the effectiveness and robustness of a model, allowing researchers and
practitioners to make informed decisions about model selection, training procedures, and
deployment.
Here are some key risk statistics commonly used in machine learning:
1. Generalization Error:
• This measures the average error of a model on unseen data.
• It is the true risk we ultimately want to minimize, but it can never be directly observed.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Common estimators for generalization error include:
Test error: The average error on a held-out test set.
Cross-validation error: The average error across multiple rounds of model training and evaluation
using different data splits.
2. Bias and Variance:
• Bias: This is the systematic difference between the average prediction of a model and the true
target value.
• Variance: This measures the variability of the model's predictions across different samples of data.
• The ideal model would have both low bias and low variance.
• Bias can be estimated using techniques like cross-validation or comparing to a benchmark model.
• Variance can be estimated by looking at the variation in predictions across different data splits or
by using resampling methods like bootstrapping.
3. Loss Function:
• This function measures the error between the model's predictions and the true target values.
• Different loss functions are used for different types of tasks (e.g., mean squared error for
regression, cross-entropy for classification).
• The average loss on a held-out test set or cross-validation folds provides an estimate of the
generalization error.
4. Confidence Intervals:
• These intervals provide a range within which the true value of a statistic (e.g., generalization error)
is likely to lie with a certain degree of confidence.
• Confidence intervals can be calculated using various methods, including bootstrapping and
asymptotic approximations.
5. AUC (Area Under the ROC Curve):
• This metric is commonly used for evaluating binary classification models.
• It measures the ability of the model to distinguish between positive and negative examples.
• Higher AUC values indicate better performance.
6. Precision and Recall:
• These metrics are also used for evaluating binary classification models.
• Precision measures the proportion of positive predictions that are actually correct.
• Recall measures the proportion of actual positive examples that the model correctly classifies.
• Depending on the specific task, one metric might be more important than the other.
7. Calibration:
• This refers to how well the model's predicted probabilities correspond to the actual class
probabilities.
• A well-calibrated model's predictions can be used to accurately estimate the true probabilities of
events.
• Calibration curves and calibration error metrics can be used to assess calibration.
8. Stability:
• This measures how sensitive the model's predictions are to small changes in the data.
• A stable model is less likely to be affected by noise and outliers in the data.
• Stability can be assessed by analyzing the model's performance on perturbed versions of the data
or by comparing its predictions across different data splits.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Sampling distribution of an estimator
In machine learning, we often use estimators to approximate unknown population parameters based
on a sample of data. However, these estimators are not guaranteed to be exactly equal to the true
parameter value. Instead, they will vary depending on the specific sample we choose. The sampling
distribution of an estimator describes the probability distribution of all possible values the estimator
can take across different random samples of the same size from the population.
Example:
• Mean: The sampling distribution of the sample mean is approximately normal for large samples, even
if the population distribution is non-normal. This is due to the Central Limit Theorem.
• Proportion: The sampling distribution of the sample proportion can be approximated by a binomial
distribution, especially for large samples with moderate success probabilities.
• Model parameters: The sampling distribution of the estimated parameters of a machine learning model
(e.g., regression coefficients in linear regression) will depend on the specific model and the estimation
method used.
EMPIRICAL RISK MINIMIZATION.
The Empirical Risk Minimization (ERM) principle is a learning paradigm which consists in
selecting the model with minimal average error over the training set. This so-called training error
can be seen as an estimate of the risk (due to the law of large numbers), hence the alternative name
of empirical risk.
By minimizing the empirical risk, we hope to obtain a model with a low value of the risk. The larger
the training set size is, the closer to the true risk the empirical risk is.
If we were to apply the ERM principle without more care, we would end up learning by heart, which
we know is bad. This issue is more generally related to the overfitting phenomenon, which can be
avoided by restricting the space of possible models when searching for the one with minimal error.
The most severe and yet common restriction is encountered in the contexts of linear
classification or linear regression. Another approach consists in controlling the complexity of the
model by regularization.
Example:
Data: We have a dataset of 100 points, each represented by a feature (x) and a target value (y). We
want to find a linear function (model) that best fits this data.
Loss Function: We choose the squared error loss function, which measures the average squared
difference between the predicted and actual target values.
ERM Process:
Start with a family of models: In linear regression, this family consists of all possible linear functions
of the form y = mx + b, where m and b are the slope and intercept, respectively.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
For each model in the family:
Calculate the predicted target value for each data point using the model's function.
Calculate the squared error for each data point.
Calculate the average squared error over all data points (this is the empirical risk of the model).
Choose the model with the smallest empirical risk: This model is our best estimate of the true
underlying relationship between x and y.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
UNIT-II

SUPERVISED LEARNING (REGRESSION/CLASSIFICATION)


• In supervised learning, the algorithm is trained on a labelled dataset, where the input data is paired with
corresponding output labels. The goal is for the algorithm to learn the mapping between inputs and outputs,
allowing it to make predictions or decisions on new, unseen data.
• When the task involves making predictions or classifying data into predefined categories, supervised learning
is widely used. It is applicable in scenarios where there is a clear relationship between input features and the
desired output. Common use cases include regression tasks (predicting numerical values) and classification
tasks (assigning labels to instances).
• Supervised learning finds applications in various fields, such as finance, healthcare, marketing, and natural
language processing. Supervised learning is applicable in finance for predicting stock prices. In healthcare, it
may aid in diagnosing diseases based on patient data. In marketing, it can help predict customer preferences. It
is used for sentiment analysis and language translation in natural language processing.
• When training the algorithm, supervised learning utilizes historical data with known outcomes. During the
model development phase, supervised learning learns the underlying patterns in the data and makes informed
predictions on new data during the prediction phase.
• The algorithm receives a labelled training dataset in supervised learning, comprising input features and their
corresponding output labels. The algorithm learns the relationship between inputs and outputs by adjusting its
internal parameters during training. Once trained, the model can generalise its knowledge to make predictions
on new, unseen data.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Supervised learning

Classification:
• Classification is a supervised learning technique in machine learning that deals with the
categorization of data into predefined classes or labels. It involves training a model on a dataset
with known categories and then using that trained model to predict the class of new, unseen
instances.
• Classification is crucial for scenarios where the goal is to assign data points to specific categories
or classes. It is widely used in various applications, such as spam filtering, sentiment analysis, image
recognition, and medical diagnosis. The primary objective is to build a model that can make accurate
predictions on new data based on patterns learned from the training data.
• Classification finds applications in diverse fields. For instance, in finance, it's used for credit
scoring; in healthcare, it aids in disease diagnosis; in image recognition, it identifies objects in
images; and in natural language processing, it classifies text sentiments. The versatility of
classification makes it applicable in numerous domains.
• During the model development phase, the algorithm is trained on a labeled dataset to perform
classification. The trained model can classify new instances. When the task involves sorting data
into distinct classes or categories based on certain features, the model is employed.
• In classification, the algorithm learns patterns from labelled training data to make predictions on
new, unseen data. Various classification algorithms, such as logistic regression, decision trees,
support vector machines, and neural networks, implement different strategies to identify decision
boundaries and classify data points into distinct classes.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
1. Binary Classification: This type of classification involves two possible classes, such as true/false,
yes/no, or 0/1. Examples include spam detection, fraud detection, and medical diagnosis, where the
outcome is binary.
2. Multi-class Classification: In multi-class classification, the task involves more than two classes. For
example, classifying emails into categories like "work," "personal," or "promotions" Multi-class
classification is prevalent in scenarios with multiple possible outcomes.

Types of Classification Algorithms


✓ KNN
✓ Naïve bayes
✓ Decision tree
✓ Random forest
✓ Logistic regression
✓ Support vector machine
Regression
• Regression is a supervised learning technique in machine learning that predicts a continuous outcome by
establishing a relationship between a dependent variable (output) and one or more independent variables (input).
The goal of regression is to predict a continuous outcome, making it suitable for scenarios where the target
variable is numerical or quantitative.
• Regression is used when the task requires understanding the relationship between variables and making
predictions based on that relationship. Regression models are widely used for forecasting and making
predictions in scenarios such as weather forecasting, market trends, and predicting stock prices. Regression
models help in understanding how changes in independent variables impact the dependent variable.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Regression finds applications in various fields, including economics, finance, biology, and engineering. In
economics, regression enables the prediction of the impact of factors such as inflation and interest rates on the
gross domestic product (GDP). Regression can be used in healthcare to predict patient outcomes based on
various medical parameters.
• Regression models are used when variables are expected to have a continuous relationship, resulting in
numerical output. During the model development phase, the algorithm applies regression to train on historical
data and learn the patterns and relationships between variables. Once trained, the model can make predictions
based on new data.
• In regression, the algorithm fits a mathematical model to the data, typically a straight line or a curve, that
represents the relationship between the independent and dependent variables. During prediction, the model
utilizes the learned relationship to predict the continuous output variable, after being trained on a labeled dataset.
Types of Regression Algorithms
✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression
BASIC METHODS: DISTANCE BASED METHODS
Distance Based Models
Distance-based models are the second class of Geometric models. Like Linear models, distance-based models
are based on the geometry of data. As the name implies, distance-based models work on the concept of distance.
In the context of Machine learning, the concept of distance is not based on merely the physicaldistance
between two points. Instead, we could think of the distance between two points considering the mode of
transport between two points. Travelling between two cities by plane covers less distance physically than by train
because a plane is unrestricted. Similarly, in chess, the concept of distance depends on the piece used – for

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
example, a Bishop can move diagonally. Thus, depending on the entity and the mode of travel, the concept of
distance can be experienced differently. The distance metrics commonly used are
Euclidean, Minkowski, Manhattan, and Mahalanobis.

Distance is applied through the concept of neighbours and exemplars. Neighbours are points in proximity with
respect to the distance measure expressed through exemplars. Exemplars are either centroids that find a centre
of mass according to a chosen distance metric or medoids that find the most centrally located data point. The most
commonly used centroid is the arithmetic mean, which minimises squared Euclidean distance to all other points.
NEAREST NEIGHBOURS:
K-Nearest Neighbor Algorithm (K-NN)
• K-Nearest Neighbours (K-NN) is a simple and versatile machine learning algorithm based on supervised
learning. However, it is most commonly used for classification tasks, although it can also be applied to
both regression and classification problems. K-NN is a non-parametric, instance-based learning
algorithm that makes predictions based on the similarity of a new data point to its k-nearest neighbours
in the training dataset.
• When the task involves making predictions based on the similarity of data points, K-NN is chosen. It is
particularly useful when the underlying structure of the data is complex and not easily captured by a
mathematical model. K-NN is robust and does not assume any specific distribution of the data, making
it suitable for various types of datasets.
• K-NN finds applications in a wide range of fields, including image recognition, recommendation
systems, medical diagnosis, and pattern recognition. In image recognition, K-NN identifies the class of
a new image by comparing its similarity to previously labeled images.
• During the model development phase, the algorithm is trained on a labeled dataset using K-NN. When
applying K-NN, the prediction task involves finding the class or value of a new data point based on the
majority class or average of its k-nearest neighbors.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• In K-NN, the algorithm classifies a new data point by examining the k-nearest neighbours in the training
dataset. Distance metrics, such as Euclidean distance or Manhattan distance, typically measure the
similarity between data points. The class or value of the new data point is determined by a majority vote
or by averaging the values of its k-nearest neighbours.

Euclidean Distance:

K-NN Algorithm working Step by Step process


1) Select the number K of the neighbors
2) Calculate the Euclidean Distance of K no of neighbors
3) Take the KNN as per the Euclidean Distance
4) Among these K NN count the no of the data points in each category
5) Assign the new data points to that category for which the no of neighbors is max
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
6) Finally our KNN model is ready
Example:
Perform KNN classification algorithm on following dataset and predict the class for x(P1=3 and
P2=7) K=3.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Application
• Used in classification
• Used in get missing values
• Used in pattern recognition

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
DECISION TREES
✓ Decision Tree is a supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
✓ In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
✓ The decisions or the test are performed on the basis of features of the given dataset.
✓ It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions.
✓ It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
✓ In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
✓ A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into sub trees.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Algorithm

1. Start with a training data set which we‘ll call S. It should have attributes and classification.

2. Determine the best attribute in the dataset. (We will go over the definition of best attribute)

3. Split S into subset that contains the possible values for the best attribute.

4. Make decision tree node that contains the best attribute.

5. Recursively generate new decision trees by using the subset of data created from step 3 until a
stage is reached where you cannot classify the data further. Represent the class as leaf node.
Example

Formulas

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Infogain

Calculate Entropy of outlook

I(Outlook, Overcast) = 0
I(Outlook, Rain) = 0.971
Total Entropy = 0.694
Gain = infogain – Entropy
= 0.940-0.694 = 0.246

Gain of each attribute


M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Attribute Gain
Outlook 0.246 First Splitting
Temperature 0.029 Point
Humidity 0.151
Wind 0.048

(i) Repeat Entire process for outlook = sunny

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Outlook Temperature Humidity Wind Play?

Sunny Hot High Weak No


Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
gain 0.571 0.971 0.97
Second Splitting Point
Outlook = Rain
Outlook Temperature Humidity Wind Play?
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
gain 0.019 x 0.97

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Advantage
• Easy to use and understand.
• Can handle both categorical and numerical data.
• Resistant to outliers, hence require little data preprocessing.

Disadvantages
• The decision tree contains lots of layers, which makes it complex.
• It may have an over fitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase.

Application
Decision tree has been used to develop models for prediction and classification in different domains some of
which are
• Business management
• Customer relationship management
• Fraudulent statement detection
• Engineering, Energy consumption
• Fault diagnosis
• Healthcare Management
• Agriculture
NAIVE BAYES

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
✓ Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
✓ It is mainly used in text classification that includes a high-dimensional training dataset.
✓ Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
✓ It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
✓ Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is
true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
✓ Working of Naïve Bayes' Classifier can be understood with the help of the below example:
✓ Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
✓ Convert the given dataset into frequency tables.
✓ Generate Likelihood table by finding the probabilities of given features.
✓ Now, use Bayes theorem to calculate the posterior probability.
✓ Problem: If the weather is sunny, then the Player should play or not?
✓ Solution: To solve this, first consider the below dataset:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Frequency table for the Weather Conditions

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Applying Bayes'theorem
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes) / P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Applications:
✓ It is used for Credit Scoring.
✓ It is used in medical data classification.
✓ It is used in Text classification such as Spam filtering and Sentiment analysis.
LINEAR MODELS
Linear Regression
• Linear regression is simple and easy algorithm
• Linear regression is a statistical approach is used for predictive analysis
• Linear regression to solve regression problems
• Linear regression is a continuous variable
• Relationship between dependent variable and Independent variable
• Either positive or negative regression BEST Fit Line – Straight Line
Y=b0+b1*X
Where:
Y= Dependent variable X=
Independent variable
B0=Intercept
B1=coefficient of relationship between X&Y
Linear Regression Line
1. Positive Regression: If the dependent variable increase on the Y-axis and independent variable on X-
axis thensuch a relationship is termed as a POSITIVE Regression

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Y=b0+b1*X
2. Negative Regression: If the dependent variable decrease on the Y-axis and independent variable increase
on theX-axis then such relationship is called a negative regression

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Y= - b0+b1*X
Example: Linear Regression Using Least square method
Independe Dependen
ntvariable tVariable ̅
𝑿−𝑿 ̅
𝒀−𝒀 ̅ )𝟐
(𝑿 − 𝑿 ̅ )*(𝒀 − 𝒀
(𝑿 − 𝑿 ̅)
X Y

1 2 1-3=-2 2-4=-2 4 4

2 4 2-3=-1 4-4=0 1 0

3 5 3-3=0 5-4=1 0 0

4 4 4-3=1 4-4=0 1 0

5 5 5-3=2 5-4=1 4 2

3 4 10 6

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
LOGISTIC REGRESSION: Logistic regression is a statistical method commonly employed
in machine learning and statistics for binary classification tasks with a categorical outcome
having two possible classes (e.g., 0 or 1, yes or no, true or false). It's a type of regression
analysis commonly employed in machine learning and statistics.
Unlike linear regression, which predicts continuous outcomes, logistic regression models the
probability that a given input belongs to a particular class. It does this by applying a logistic
(or sigmoid) function to a linear combination of the input features. The logistic function
constrains the output of the regression model between 0 and 1, representing probabilities.
The logistic regression model can be mathematically represented as:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Where:
• P(Y=1∣X) is the probability of the target variable being 1 given the input features X.
• e is the base of the natural logarithm.
• 0, 1, 2,..., n are the model's coefficients learned during training.
• The input features are 1, 2,..., X1, X2,..., and Xn.
During training, the model learns the optimal values for the coefficients (weights) that
minimise a chosen loss function, typically the log loss or cross-entropy loss. These coefficients
determine the relationship between the input features and the log-odds of the target variable.
Various fields like healthcare, finance, marketing, and social sciences widely use logistic
regression for tasks such as predicting whether an email is spam or not, diagnosing diseases,
determining customer churn, etc. It's also a fundamental building block for more complex
machine learning algorithms and techniques.

Example:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression

X = np.array([[2.5], [3.5], [4.5], [5.5], [6.5], [7.5], [8.5], [9.5]])

y = np.array([0, 0, 0, 1, 1, 1, 1, 1])

model = LogisticRegression()

model.fit(X, y)

new_X = np.array([[3.0], [4.0], [6.0], [8.0]])

predicted_probs = model.predict_proba(new_X)[:, 1]

plt.scatter(X[y == 0], np.zeros_like(X[y == 0]), color='blue', label='Class 0')

plt.scatter(X[y == 1], np.ones_like(X[y == 1]), color='red', label='Class 1')

plt.plot(new_X, predicted_probs, color='green', linestyle='--', marker='o',


label='Decision Boundary')

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
plt.xlabel('Feature')

plt.ylabel('Probability of Class 1')

plt.title('Logistic Regression')

plt.legend()

plt.grid(True)

plt.show()

Output:

GENERALIZED LINEAR MODELS: Generalized Linear Models (GLMs) are a class of


statistical models that extend traditional linear regression to accommodate response variables
with non-normal error distributions, employing a systematic component linking predictors to
the response through a linear predictor function and a specified link function. GLMs offer a
flexible framework for modeling diverse data types, including binary outcomes, count data,
and continuous data with non-normal distributions, with advantages such as interpretability,
flexibility in handling different response types, and accommodation of non-linear relationships
between predictors and responses. They have become a fundamental tool in various fields,
providing a versatile approach to modelling complex data structures.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Example:
import numpy as np
import statsmodels.api as sm
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 3, 5]) # Number of events
# Fit Poisson regression model
poisson_model = sm.GLM(y, sm.add_constant(x), family=sm.families.Poisson()).fit()
# Print model summary
print(poisson_model.summary())
Output:

SUPPORT VECTOR MACHINES


• SVM is supervised learning.
• Which can be used for classification problem and regression problem
• Mostly used for classification problem.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Each data point N-dimensions.
• SVM simply the coordinate of individual observation
• We perform classification by finding the hyper plan that two different classes.

The followings are important concepts in SVM −


• Support Vectors − Datapoints that are closest to the hyperplane is called support
vectors. Separating line will be defined with the help of these data points.
• Hyperplane − As we can see in the above diagram, it is a decision plane or space
which is divided between a set of objects having different classes.
• Margin − It may be defined as the gap between two lines on the closet data points
of different classes. Itcan be calculated as the perpendicular distance from the line
to the support vectors. Large margin is considered as a good margin and small
margin is considered as a bad margin.
Advantages
• Works very well with limited dataset.
• Good Accuracy
Disadvantages
• Doesn‘t work well with large Dataset
Applications
• Image Classification
• Face Detect
• Handwriting recognition
• Text Categorization
BINARY CLASSIFICATION: MULTICLASS/STRUCTURED OUTPUT: Binary
classification refers to a machine learning task where the goal is to classify data into one of two
classes or categories. Examples include spam detection (spam or not spam), disease diagnosis
(diseased or healthy), and sentiment analysis (positive or negative sentiment).
On the other hand, multiclass classification involves classifying data into three or more classes.
Each instance in the dataset belongs to one and only one class. Examples include classifying
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
emails into multiple categories (e.g., spam, promotional, social), classifying images of animals
into different species, or recognising handwritten digits from 0 to 9.
Structured output classification refers to the task of predicting structured objects as outputs,
where the output is not just a single label but rather a structured object. This can include tasks
like sequence labelling (e.g., part-of-speech tagging, named entity recognition), semantic
segmentation (assigning a label to each pixel in an image), or parsing (e.g., parsing a sentence
into its syntactic structure).
The model in multiclass or structured output classification predicts multiple classes or a
structured object, rather than being limited to a single binary decision.
Algorithms such as logistic regression, decision trees, random forests, support vector machines,
and neural networks are applicable for multiclass classification. The extension to multiple
classes usually involves techniques such as one-vs-all (OvA) or one-vs-one (OvO) strategies,
where binary classifiers are trained for each class or pair of classes.
For structured output classification, more complex models like conditional random fields
(CRFs), recurrent neural networks (RNNs), convolutional neural networks (CNNs), or graph
neural networks (GNNs) are often used. These models are capable of handling structured
outputs directly and are commonly used in tasks like sequence labeling, semantic segmentation,
and parsing.
Example:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train a logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
# Display classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
Output:

MNIST: The MNIST dataset is a widely used benchmark dataset in the fields of machine
learning and computer vision. It stands for "Modified National Institute of Standards and
Technology" and consists of a large collection of grayscale images of handwritten digits from
0 to 9. Each image is 28 pixels in height and 28 pixels in width, resulting in a total of 784 pixels
per image. Researchers commonly use the MNIST dataset to train and test algorithms in image
classification, especially for tasks involving handwritten digit recognition. It has become a
standard dataset for evaluating the performance of various machine learning algorithms,
including neural networks, support vector machines, decision trees, and more. It divides the
dataset into two main parts: a training set and a test set. The training set contains 60,000 images,
while the test set contains 10,000 images. Each image includes a label indicating the digit it
represents (0 through 9). Due to its simplicity, standardisation, and accessibility, the MNIST
dataset has played a crucial role in advancing research and development in the field of machine
learning, particularly in the early stages of deep learning and convolutional neural networks
(CNNs). It has also served as a benchmark for comparing the performance of different
algorithms and techniques.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
RANKING: Ranking in machine learning involves the task of organising a set of items based
on their relevance or importance within a given context, which is crucial for applications like
information retrieval and recommendation systems. This task encompasses approaches such as
pointwise, pairwise, and listwise ranking, each addressing different aspects of the ranking
problem. Pointwise ranking treats each item independently, predicting its relevance score,
while pairwise ranking aims to learn the preference between pairs of items. Listwise ranking,
on the other hand, considers the entire list of items as a single instance and directly optimises
the ranking of the entire list. Learning to Rank (LTR) frameworks encompass these approaches,
training models on labelled data to rank items for new queries, thereby facilitating tasks like
search engine optimisation and personalised recommendations.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
UNIT-III
Ensemble Learning:
Ensemble methods in machine learning combine the insights obtained from multiple
learning models to facilitate accurate and improved decisions. In learning models, noise,
variance, and bias are the major sources of error. The ensemble methods in machine learning
help minimise these error-causing factors, thereby ensuring the accuracy and stability of
machine learning (ML) algorithms.
Ensemble learning works by training multiple base learners on the same dataset, but using
different algorithms or subsets of the data. These base learners could be decision trees, neural
networks, support vector machines, or any other machine learning algorithm. After training the
base learners, they combine their predictions in some way to produce the final prediction.
In ensemble learning, there are several methods for combining base learners' predictions,
including:
1. Each base learner makes a prediction, and the final prediction is determined by a
majority vote (for classification tasks) or averaging (for regression tasks) of the
individual predictions.
2. Each base learner's prediction is weighted based on its performance on a validation set
or another criterion, and then these weighted predictions are combined to create the
final prediction.
3. In stacking, the meta-learner is trained using the predictions of the base learners to
make the final prediction. This allows the meta-learner to learn how to best combine
the predictions of the base learners.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Ensemble methods can be divided into two groups:
• Sequential ensemble methods where the base learners are generated sequentially (e.g.
AdaBoost). The basic motivation of sequential methods is to exploit the dependence
between the base learners. The overall performance can be boosted by weighing
previously mislabeled examples with higher weight.
• Parallel ensemble methods where the base learners are generated in parallel (e.g.
Random Forest). The basic motivation of parallel methods is to exploit independence
between the base learners since the error can be reduced dramatically by averaging.

Bagging and random forests


Bootstrap Aggregation (or Bagging for short), is a simple and very powerful ensemble method.
Bagging is the application of the Bootstrap procedure to a high-variance machine learning
algorithm, typically decision trees.
1. Suppose there are N observations and M features. A sample from observation is selected
randomly with replacement(Bootstrapping).
2. A subset of features are selected to create a model with sample of observations and subset
of features.
3. Feature from the subset is selected which gives the best split on the training data.(Visit my
blog on Decision Tree to know more of best split)
4. This is repeated to create many models and every model is trained in parallel
5. Prediction is given based on the aggregation of predictions from all the models.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
When bagging with decision trees, we are less concerned about individual trees overfitting the
training data. For this reason and for efficiency, the individual decision trees are grown deep
(e.g. few training samples at each leaf-node of the tree) and the trees are not pruned. These trees
will have both high variance and low bias. These are important characterize of sub-models when
combining predictions using bagging. The only parameters when bagging decision trees is the
number of samples and hence the number of trees to include. This can be chosen by increasing
the number of trees on run after run until the accuracy begins to stop showing improvement.

Figure: Bagging
Advantages
• Efficient on large datasets
• More accurate than decision trees
• Averaging results of many trees reduces variance
Disadvantages
• More difficult to interpret than decision trees
• Less clear which variable are of greatest importance for predicting the response
• More computationally intensive than forming a single decision tree
Applications:
1. Classification and Regression Tasks:
o Example: predicting customer churn in a telecom company. Bagging with
decision trees enables training multiple models to predict customer churn based
on features like usage patterns, customer demographics, and service
subscription details.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
2. Medical Diagnosis:
o Example: predicting the likelihood of a patient having a particular disease based
on symptoms, medical history, and test results. Bagging with decision trees can
help create an ensemble model that combines predictions from individual
decision trees trained on various patient data subsets, enhancing diagnostic
accuracy.
3. Finance and Risk Management:
o Example: credit risk assessment for loan approval. Bagging with decision trees
can help create a strong predictive model that assesses the creditworthiness of
loan applicants based on their financial history, employment status, and other
relevant factors. This can help financial institutions make informed decisions
about loan approvals while managing risk effectively.
4. Marketing and customer segmentation:
o Example: segmenting customers based on their purchasing behaviour and
preferences. Bagging with decision trees can help identify distinct customer
segments and tailor marketing strategies accordingly. For instance, an e-
commerce company can use this approach to personalise product
recommendations and promotional offers for different customer segments,
thereby improving customer engagement and sales.
5. Image and speech recognition:
o Example: Handwritten digit recognition in optical character recognition (OCR)
systems. Bagging with decision trees can be utilized to develop an ensemble
model that accurately classifies handwritten digits by analyzing pixel intensities
and spatial features. This can be beneficial for digitising documents and
automating data entry tasks.
6. Environmental Monitoring:
o Example: predicting air quality levels based on meteorological data, pollution
levels, and geographic factors. Bagging with decision trees can be employed to
build a predictive model that forecasts air quality indices, helping local
authorities and environmental agencies take proactive measures to mitigate
pollution and protect public health.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Random forests

Random forest is a supervised learning algorithm which is used for both classification as well
as regression. But however, it is mainly used for classification problems. As we know that a
forest is made up of trees and more trees means more robust forest. Similarly, random forest
algorithm creates decision trees on data samples and then gets the prediction from each of
them and finally selects the best solution by means of voting. It is an ensemble method which
is better than a single decision tree because it reduces the over-fitting by averaging the result.

Working of Random Forest Algorithm

We can understand the working of Random Forest algorithm with the help of following steps

• Step 1 − First, start with the selection of random samples from a given dataset.

• Step 2 − Next, this algorithm will construct a decision tree for every sample. Then it
will get the prediction result from every decision tree.

• Step 3 − In this step, voting will be performed for every predicted result.

• Step 4 − At last, select the most voted prediction result as the final prediction result.

The following diagram will illustrate its working −

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Advantages

1. Powerful and accurate


2. Good performance on many problems including non linear.
Disadvantages

1. No interpretability
2. Overfitting can easily occur
3. Need to choose the number of trees
So basically Random forest is used when you are just looking for high performance with less
need for interpretation.

Application:

1. Medical Diagnosis:

o Example: predicting the presence of a particular disease (e.g., diabetes, cancer)


based on patient characteristics such as age, gender, medical history, and
laboratory test results. Healthcare professionals can use random forests trained
on a dataset of patient records to classify individuals as either having the disease
or not, aiding in making informed diagnostic decisions.

2. Financial Fraud Detection:

o Example: identifying fraudulent transactions in banking and credit card systems.


Random forests can analyse transaction data, including transaction amount,
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
location, time, and user behaviour patterns, to detect suspicious activities and
flag potential instances of fraud for further investigation.

3. Customer Churn Prediction:

o Example: predicting whether customers are likely to churn (i.e., stop using a
service or cancel a subscription) based on their interactions with a product or
service. Random forests can analyse customer data, such as usage patterns,
feedback, and demographic information, to identify at-risk customers and
develop targeted retention strategies.

4. Image Classification:

o Example: classifying images into different categories (e.g., animals, objects,


scenes) for applications such as object recognition, content-based image
retrieval, and medical imaging. One can train random forests on labeled image
datasets to automatically classify new images based on their visual features and
characteristics.

5. Ecological Modelling:

o Example: predicting species distribution and habitat suitability for conservation


and ecological research. Random forests can analyse environmental variables
such as temperature, precipitation, soil type, and vegetation cover to model the
distribution of plant and animal species across different landscapes, helping
guide conservation efforts and land management decisions.

6. Retail sales forecast:

o Example: Forecasting future sales and demand for retail products based on
historical sales data, promotional activities, seasonal trends, and economic
factors. Random forests can analyse large volumes of transactional data to
identify patterns and relationships that influence sales performance, enabling
retailers to optimise inventory management and pricing strategies.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Boosting: Boosting is a technique to combine weak learners and convert them into strong
ones with the help of Machine Learning algorithms. It uses ensemble learning to boost the
accuracy of a model. Ensemble learning is a technique to improve the accuracy of Machine
Learning models. There are two types of ensemble learning:

It is a boosting technique where the outputs from individual weak learners associate
sequentially during the training phase. The performance of the model is boosted by assigning
higher weights to the samples that are incorrectly classified. AdaBoost algorithm is an example
of sequential learning that we will learn later in this blog.

Boosting Algorithms

Boosting is creating a generic algorithm by considering the prediction of the majority of weak
learners. It helps in increasing the prediction power of the Machine Learning model. This is done
by training a series of weak models.

Below are the steps that show the mechanism of the boosting algorithm:

1. Reading data

2. Assigning weights to observations

3. Identification of misinterpretation (false prediction)

4. Assigning the false prediction, along with a higher weightage, to the next learner

5. Finally, iterating Step 2 until we get the correctly classified output

Advantages of Boosting:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
1. Improved Accuracy: Boosting algorithms typically produce highly accurate
predictions by combining multiple weak learners, each focusing on different aspects of
the data.
2. Reduced Bias and Variance: Boosting reduces both bias and variance, leading to
models that generalise well to unseen data and are less prone to overfitting.
3. Feature Importance: Boosting algorithms provide insights into feature importance,
allowing users to identify the most influential predictors in their models.
4. Boosting algorithms' versatility extends to various machine learning tasks, such as
classification, regression, and ranking, making them essential tools in the data scientist's
toolbox.
Boosting has its disadvantages:
1. Boosting algorithms are sensitive to noisy data and outliers, which can negatively affect
model performance if not properly handled.
2. Computationally Intensive: Training boosting models can be computationally
intensive, especially when dealing with large datasets or complex models with many
iterations.
3. Boosting algorithms are less prone to overfitting compared to individual weak
learners, but they can still overfit if the model complexity is not properly controlled.
4. Interpretability: Boosting models can be challenging to interpret due to their ensemble
nature and the complexity of the underlying algorithms.
Examples and applications:
1. Face Recognition:
o Boosting algorithms can help detect and recognize faces in images or videos by
combining multiple weak classifiers for face recognition tasks.
o AdaBoost can build a face recognition system that combines simple facial
features (e.g., eyes, nose, mouth) to identify individuals in photos or
surveillance footage.
2. Customer Churn Prediction:
o Boosting algorithms commonly identify customers likely to leave a service or
product subscription in customer churn prediction.
o GBM can analyze customer data (e.g., usage patterns, demographics,
satisfaction scores) and predict churn, enabling companies to proactively
address customer retention.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3. Credit Risk Assessment:
o Boosting algorithms in credit risk assessment evaluate the likelihood of default
or delinquency for loan applicants.
o XGBoost can analyze financial data (e.g., credit history, income, debt-to-
income ratio) and classify loan applicants into low, medium, or high-risk
categories, aiding lenders in making informed decisions about loan approvals
and interest rates.
4. Click-through rate (CTR) prediction:
o Boosting algorithms in online advertising predict click-through rates (CTR) and
optimize ad placement strategies.
o LightGBM enables advertisers to target their campaigns more effectively by
analyzing user behavior data (e.g., browsing history, search queries, device
type) and predicting the likelihood of users clicking on specific ads.
Difference Between Bagging, Boosting and Stacking
Bagging Boosting Stacking
Partitioning
Giving misclassified
of the data Random Various
samples higher preference
into subsets
Goal to
Minimize variance increase predictive force Both
achieve
Methods
where this Random Gradient descent Blending
is used
Function to
combine
Weighted average Weighted majority Logistic Regression
single
models

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
VOTING CLASSIFIERS
A voting classifier is a type of ensemble learning method in which multiple base classifiers are
trained on the same dataset, and their individual predictions are combined to make a final
prediction. The idea behind a voting classifier is to aggregate the predictions of multiple
classifiers and use a majority vote (for classification tasks) or averaging (for regression tasks)
to determine the final prediction.
There are two main types of voting classifiers:
1. Hard Voting: In hard voting, each base classifier predicts the class label for a given
input, and the class that receives the most votes is chosen as the final prediction. This
approach works well when the base classifiers are diverse and the class labels are well-
defined.
2. Soft Voting: In soft voting, instead of simply counting the votes for each class label,
the classifiers' predicted probabilities for each class are averaged, and the class with the
highest average probability is chosen as the final prediction. Soft voting tends to be
more effective when the base classifiers can output probability estimates, as it takes
into account the confidence of each classifier's predictions.
Voting classifiers can be constructed using different types of base classifiers, such as decision
trees, support vector machines, logistic regression, or any other classification algorithm. The
key idea is to leverage the diversity of individual classifiers to improve overall prediction
accuracy and generalisation performance.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Voting classifiers are commonly used in practice because they are simple to implement and
often yield robust performance, especially when the base classifiers are diverse and
complementary to each other. They are particularly useful in situations where no single
classifier performs consistently well across all parts of the input space.

Example:

# Import necessary libraries

from collections import Counter

# Define three classifiers' predictions

classifier1_prediction = 'AIDS'

classifier2_prediction = 'CSE'

classifier3_prediction = 'AIDS'

# Combine the predictions using voting

votes = [classifier1_prediction, classifier2_prediction, classifier3_prediction]

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
majority_vote = Counter(votes).most_common(1)[0][0]

# Final prediction

print("Majority Vote Prediction:", majority_vote)

Output:
Majority Vote Prediction: AIDS

Bagging and Pasting


Bagging (Bootstrap Aggregating) and Pasting are ensemble learning techniques that involve
training multiple instances of a base learning algorithm on different subsets of the training data,
then combining their predictions.
• Bagging (Bootstrap Aggregating): In bagging, multiple subsets of the training data
are randomly sampled with replacements (bootstrap samples). Each subset trains a base
learner independently. After training, combine the predictions from all base learners
through averaging (for regression) or voting (for classification). Bagging helps to
reduce overfitting by training each base learner on a slightly different subset of the data,
which introduces diversity among the models. The most common example of bagging
is the Random Forest algorithm, which constructs multiple decision trees trained on
different subsets of the data and averages their predictions.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Pasting: Pasting is similar to bagging, but instead of sampling with replacement, it
samples without replacement. This means that each instance of the training data can
only be sampled once for each subset. Pasting is useful when you have a large dataset
and want to avoid repeatedly sampling the same instances.While bagging is typically
more commonly used, pasting can sometimes lead to a slightly lower variance in the
final model because each instance is only used once in each base learner's training
process.
Both bagging and pasting are effective techniques for improving the generalisation
performance of machine learning models, especially when the base learners are unstable
(sensitive to small changes in the training data) or when the dataset is noisy. They are
commonly used in combination with decision trees or other simple classifiers to create more
robust and accurate ensemble models.
STACKING
Stacking, also known as stacked generalisation, is an ensemble learning technique that
combines multiple base models with a meta-model to make predictions. Unlike traditional
ensemble methods like bagging and boosting, where models are combined through averaging
or voting, stacking involves training a meta-model to learn how to best combine the predictions
of the base models.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Base Models: Trainers train multiple diverse base models on the same dataset using different
algorithms or subsets of the data. Each base model makes predictions on the same set of
instances.
Meta-Model: The meta-model or blender trains using the predictions generated by the base
models as features. The meta-model learns to combine the predictions of the base models in a
way that optimises predictive performance on a validation set. Typically, the meta-model is a
simple model like linear regression, logistic regression, or a shallow neural network.
Prediction: When making predictions on new data, the base models first make predictions on
the new instances. The meta-model uses these predictions as features to make the final
prediction.

Stacking enables the base models to interact more complexly and capture patterns in the data
that individual models may miss. By training a meta-model to learn how to best combine the
predictions of the base models, stacking can often achieve higher predictive accuracy compared
to any single model alone.
One important consideration in stacking is how to prevent overfitting. To address this, you can
use cross-validation to generate predictions for the meta-model or utilize hold-out sets for
training and validation.
RANDOM FORESTS
Random Forest is a versatile ensemble learning algorithm widely used for both classification
and regression tasks. It constructs multiple decision trees during training by randomly sampling
data with replacement and selecting a subset of features at each node split, introducing diversity
and reducing overfitting. Through a voting mechanism in classification or averaging in
regression, the final prediction is made, leveraging the collective wisdom of the individual
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
trees. Renowned for their robustness, Random Forests excel in various data scenarios,
providing estimates of feature importance and performing well in high-dimensional spaces.
While computationally intensive for large datasets, its simplicity, interpretability, and
effectiveness across diverse applications make it a preferred choice for building reliable
predictive models.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by combining N decision
tree, and second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is
given to the Random forest classifier. The dataset is divided into subsets and given to each
decision tree. During the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results, the Random Forest
classifier predicts the final decision. Consider the below image:

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.

2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.

3. Land Use: We can identify the areas of similar land use by this algorithm.

4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest

o Random Forest is capable of performing both Classification and Regression tasks.

o It is capable of handling large datasets with high dimensionality.


M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
o It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest

o Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.

LINEAR SVM CLASSIFICATION:


• For classification and regression tasks, the Support Vector Machine (SVM) is a
supervised learning algorithm.
• In linear SVM classification, the algorithm aims to find the hyperplane that best
separates the classes in the feature space.
• We determine the hyperplane to maximize the margin between the nearest data points
(called support vectors) from different classes.
• When the classes are linearly separable—that is, when a straight line or hyperplane can
separate them—linear SVM is effective.
• The optimisation problem in linear SVM involves finding the weights and biases that
define the hyperplane, usually solved using optimisation techniques such as gradient
descent.

How Linear SVM Classification Works: Linear Support Vector Machine (SVM)
classification is a supervised learning algorithm used for classifying data points into different
classes. Here's how it works:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Separating Hyperplane: Linear SVM aims to find the hyperplane that best separates
the classes in the feature space. The hyperplane is a decision boundary that divides the
feature space into regions associated with different classes.
• Maximizing Margin: The objective is to determine the hyperplane that maximizes the
margin between the nearest data points (support vectors) from different classes. This
margin represents the distance between the hyperplane and the closest data points.
• Optimization Problem: Linear SVM solves an optimization problem to find the
weights and biases that define the hyperplane. This problem is typically solved using
optimization techniques such as gradient descent.
Features of Linear SVM Classification:
• Effective for Linearly Separable Data: Linear SVM works well when the classes are
linearly separable, meaning a straight line or hyperplane can be drawn to separate them.
• Margin Maximization: It maximizes the margin between classes, which often leads to
better generalization and improved performance on unseen data.
• Robustness to Overfitting: SVMs are less prone to overfitting, especially in high-
dimensional spaces, compared to other algorithms like decision trees.
• Global Solution: Linear SVM typically finds the global optimum solution, meaning it
converges to the best possible hyperplane.

Advantages:
• Works well in High-Dimensional Spaces: Linear SVM performs well even in cases
where the number of dimensions is greater than the number of samples.
• Effective with Limited Data: It can handle datasets with a small number of samples
effectively.
• Robust to Noise: SVMs are relatively robust to noise in the data, thanks to the margin
maximization objective.
Disadvantages:
• Computationally Intensive for Large Datasets: Training time can be significant,
especially for large datasets, due to the computational complexity of solving the
optimization problem.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Less Effective with Non-Linear Data: Linear SVM may not perform well when the
data is not linearly separable. In such cases, nonlinear SVM variants or kernel tricks
may be more appropriate.
Applications:
• Text Classification: Linear SVM is commonly used in text classification tasks, such
as spam detection, sentiment analysis, and document categorization.
• Image Classification: It's also applied in image classification problems, like object
recognition and image segmentation.
• Bioinformatics: Linear SVM finds applications in bioinformatics for tasks such as
protein classification and gene expression analysis.
Example: Consider a binary classification problem where we have two classes, represented as
red and blue points in a two-dimensional feature space. Linear SVM aims to find the optimal
hyperplane (a line in this case) that separates these two classes with the maximum margin. This
hyperplane ensures that the distance between the closest points (support vectors) from each
class is maximized.
For instance, in a spam email classification scenario, linear SVM can be trained on a dataset
containing features extracted from emails (e.g., word frequencies) along with corresponding
labels (spam or not spam). It learns a hyperplane to distinguish between spam and legitimate
emails, enabling accurate classification of new, unseen emails.

NONLINEAR SVM CLASSIFICATION:


• In many real-world scenarios, classes may not be linearly separable. Nonlinear SVM
classification addresses this by using kernel functions.
• Kernel functions allow SVM to implicitly map the input data into a higher-dimensional
feature space where the classes might be linearly separable.
• Common kernel functions include polynomial kernels, Gaussian radial basis function
(RBF) kernels, and sigmoid kernels.
• SVM can classify non-linearly separable data by transforming the data into a higher-
dimensional space, which captures nonlinear relationships between features.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
How Nonlinear SVM Classification Works: Nonlinear Support Vector Machine (SVM)
classification addresses scenarios where classes are not linearly separable. Here's how it works:
• Kernel Functions: Nonlinear SVM utilizes kernel functions to map the input data into
a higher-dimensional feature space. These kernel functions allow SVM to implicitly
transform the data, potentially making the classes linearly separable in the transformed
space.
• Mapping to Higher Dimension: By applying kernel functions, SVM can effectively
map the input data into a higher-dimensional feature space where the classes might
become linearly separable.
• Classification in Higher Dimension: In this higher-dimensional space, SVM aims to
find the optimal hyperplane that separates the classes with the maximum margin,
similar to linear SVM in the original feature space.

Features of Nonlinear SVM Classification:

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Ability to Handle Nonlinear Data: Nonlinear SVM can handle data that is not linearly
separable by transforming it into a higher-dimensional space where linear separation
may be possible.
• Flexibility with Kernel Functions: Various kernel functions can be employed based
on the nature of the data and the problem, providing flexibility in capturing nonlinear
relationships.
• Efficient in High-Dimensional Spaces: Despite the transformation to a higher-
dimensional space, SVM remains efficient in terms of computation, especially
compared to explicit feature expansion methods.
Advantages:
• Versatility: Nonlinear SVM is versatile and can be applied to a wide range of problems
where linear separation is not feasible.
• Capturing Complex Relationships: By mapping data into a higher-dimensional space,
nonlinear SVM can capture complex relationships between features, leading to
improved classification accuracy.
• Kernel Flexibility: The choice of kernel functions allows customization according to
the specific characteristics of the data, potentially enhancing performance.
Disadvantages:
• Selection of Kernel Parameters: Choosing the appropriate kernel and its parameters
can be challenging and may require careful tuning, leading to potential overfitting if
not done properly.
• Computational Complexity: Nonlinear SVM can be computationally intensive,
especially with large datasets or complex kernel functions, which may increase training
time and resource requirements.
Applications:
• Image Recognition: Nonlinear SVM is widely used in image recognition tasks such as
object detection and facial recognition, where the relationships between image features
may be nonlinear.
• Bioinformatics: It finds applications in bioinformatics for tasks such as protein
structure prediction and gene expression analysis, where the underlying relationships
between biological features can be complex.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Finance: In finance, nonlinear SVM can be used for tasks like stock market prediction
and credit risk assessment, where data often exhibit nonlinear patterns.
Example: Consider a dataset with two classes, represented as concentric circles in a two-
dimensional feature space. Linear SVM would struggle to separate these classes effectively.
However, by using a Gaussian radial basis function (RBF) kernel, SVM can map the data into
a higher-dimensional space where the classes become linearly separable. This allows SVM to
find a hyperplane that effectively separates the classes, enabling accurate classification even
for nonlinear data distributions.
DIFFERENCE BETWEEN LINEAR SVM AND NON-LINEAR SVM

Aspect Linear SVM Nonlinear SVM


Finds a linear decision Maps data into a higher-dimensional
Approach boundary in the input feature space using kernel functions to
space. potentially achieve linear separability.
May be nonlinear in the original
Decision Straight line (or hyperplane)
feature space but linear in the higher-
Boundary separating classes.
dimensional space.
Relies on kernel functions to implicitly
Kernel Usage Does not use kernel functions.
map data.
Handling Addresses nonlinearity by
Assumes linear separability.
Nonlinearity transforming data using kernels.
Suitable for linearly separable Versatile, suitable for both linearly
Applicability
data. and nonlinearly separable data.

SVM REGRESSION:
• Support Vector Regression (SVR), a type of regression task, also uses SVM.
• SVR aims to find a function that predicts the continuous target variable while
maximising the margin of tolerance (ε) around the predicted value.
• Similar to classification, SVR also uses kernel functions to handle nonlinear
relationships between features.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• The objective of SVR is to find a function that stays within the margin of tolerance for
as many training instances as possible while maximising the margin.

How Support Vector Machine (SVM) Regression Works: Support Vector Machine (SVM)
Regression, also known as Support Vector Regression (SVR), is a type of regression task that
utilises the principles of SVM. Here's how it works:
• Objective: SVR aims to find a function that predicts the continuous target variable
while maximising the margin of tolerance (ε) around the predicted value. The margin
of tolerance allows some deviation from the actual target value.
• Kernel Functions: Similar to classification tasks, SVR also employs kernel functions
to handle nonlinear relationships between features. These kernel functions help map the
input data into a higher-dimensional space where a linear relationship may be
established.
• Margin Optimisation: The objective of SVR is to find a function that not only predicts
the target variable accurately but also stays within the margin of tolerance for as many
training instances as possible. This is achieved by maximizing the margin between the
predicted values and the margin boundaries.

Features of SVM Regression:


• Flexibility: SVR can capture complex relationships between features, making it
suitable for datasets with nonlinear dependencies.
• Robustness: SVR is less prone to overfitting, especially when appropriate
regularization techniques are applied.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Ability to Handle Outliers: SVR can effectively handle outliers by minimising their
impact on the model through the use of a margin of tolerance.
Advantages:
• Effective in High-Dimensional Spaces: SVR performs well even in high-dimensional
feature spaces, making it suitable for datasets with many features.
• Handles Nonlinear Relationships: By utilising kernel functions, SVR can model
nonlinear relationships between features, providing greater flexibility in capturing
complex patterns in the data.
• Resistant to Overfitting: SVR is less susceptible to overfitting, thanks to the margin
of tolerance and regularisation techniques, such as parameter C in the optimisation
objective.
Disadvantages:
• Sensitivity to Kernel Parameters: The performance of SVR can be sensitive to the
choice of kernel function and its parameters. Careful tuning is often required to achieve
optimal results.
• Computational Complexity: SVR can be computationally intensive, especially when
dealing with large datasets or complex kernel functions.
Applications:
• Stock Price Prediction: SVR can be used to predict stock prices based on historical
data, considering various factors such as market trends, trading volume, and economic
indicators.
• Energy Load Forecasting: SVR can forecast energy consumption or load demand,
helping utility companies optimise resource allocation and manage energy production
efficiently.
• Medical Diagnosis: SVR can assist in medical diagnosis tasks by predicting clinical
outcomes or disease progression based on patient data, such as demographics,
symptoms, and medical history.
Example: Consider a dataset containing information about houses, including features like
square footage, number of bedrooms, and location. SVR can be employed to predict the selling
price of houses based on these features. By training an SVR model on historical data, it learns
to predict the selling price while considering a margin of tolerance around the actual selling

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
price. This allows the model to make accurate predictions while accounting for variations and
uncertainties in the data.
NAÏVE BAYES CLASSIFIERS:
• Naïve Bayes classifiers are probabilistic models based on Bayes' theorem with a strong
assumption of feature independence.
• Despite the "naïve" assumption of feature independence, Naïve Bayes classifiers often
perform well in practice, especially for text classification tasks.
• Naïve Bayes classifiers are simple and computationally efficient, making them
particularly suitable for large datasets.
• Common variants of Naïve Bayes classifiers include Gaussian Naïve Bayes (for
continuous features), Multinomial Naïve Bayes (for discrete features with counts), and
Bernoulli Naïve Bayes (for binary features).
• Naïve Bayes classifiers calculate the posterior probability of each class given the input
features and then select the class with the highest probability as the predicted class.

Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
✓ Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
✓ Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we need
to follow the below steps:
✓ Convert the given dataset into frequency tables.
✓ Generate Likelihood table by finding the probabilities of given features.
✓ Now, use Bayes theorem to calculate the posterior probability.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
✓ Problem: If the weather is sunny, then the Player should play or not?
✓ Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions

Applying Bayes'theorem
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes) / P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Applications:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It is used in Text classification such as Spam filtering and Sentiment analysis.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Image Segmentation with Clustering

Image segmentation is the process of dividing an image into distinct regions based on
the properties of the pixels. Clustering algorithms excel at grouping similar data points
together. In image segmentation, this translates to grouping pixels with similar
characteristics, like color intensity, texture, or spatial location.

Types of Clustering for Segmentation

There are two main categories of clustering used for image segmentation:

1. K-Means Clustering: This is a popular and straightforward technique. Here's


the process:
o Define the number of clusters (K), which represents the number of
segments you expect in the image (e.g., foreground, background,
object).
o Select a feature for each pixel, typically its color value (RGB) or intensity
level in grayscale images.
o The algorithm randomly positions K centroids (cluster centers) within the
image feature space.
o Each pixel is assigned to the closest centroid, forming initial clusters.
o The centroids are then repositioned to the center of their assigned pixels,
effectively moving them towards denser regions of the feature space.
o Steps 3 and 4 are repeated until the centroids stabilize, indicating
convergence.
2. Hierarchical Clustering: This approach takes a bottom-up or top-down
strategy:
o Bottom-up (agglomerative): Initially, each pixel is considered a separate
cluster. Then, pixels with the most similarity are merged iteratively until
the desired number of clusters is reached.
o Top-down (divisive): Here, all pixels start in one cluster. This cluster is
then recursively divided based on dissimilarities until the desired number
of clusters is achieved.
Example: Segmenting a Simple Image
Imagine a basic image with a red flower in the center and a green background.

 K-Means Clustering: Here, you might set K=2 (one for the flower and one for
the background). The algorithm would group pixels with similar red hues into
one cluster (flower), and those with green tones into another (background).
 Hierarchical Clustering: In the bottom-up approach, individual pixels with
similar red colors would first merge, followed by merging with other close red
pixels, forming the flower segment. Similarly, green pixels would progressively
merge into the background segment.
Limitations of Clustering for Segmentation

While clustering offers a compelling approach, it has limitations:

 Choosing K: In K-Means, selecting the optimal number of clusters (K) can be


tricky. The appropriate K value depends on the image complexity.
 Complexities: Clustering might struggle with images containing objects with
irregular shapes, blurred boundaries, or varying lighting conditions.
 Pre-processing: Images often require pre-processing to reduce noise or
adjust color spaces for better clustering results.
Conclusion

Clustering provides a valuable method for image segmentation, especially for simpler
images or as a pre-processing step for more advanced techniques. Its ease of
implementation and efficiency for specific scenarios make it a practical tool in image
analysis.

Using Clustering for Preprocessing

Clustering can be a valuable tool in data preprocessing for various machine learning
tasks.

What is Clustering?

Clustering is an unsupervised learning technique that groups similar data points


together. Imagine you have a basket of fruits. Clustering would help you separate the
apples, oranges, and bananas based on their inherent similarities.
How Does it Help in Preprocessing?

There are two main ways clustering aids in data preprocessing:

1. Dimensionality Reduction: Datasets can have many features (dimensions).


Clustering can group similar data points, allowing you to replace each group
with a representative point (centroid). This effectively reduces the number of
data points you need to analyze, making subsequent tasks like classification or
regression more efficient.
2. Feature Engineering: Clustering can help identify hidden patterns or
subgroups within your data. These subgroups might not have been initially
apparent but can be crucial for your machine learning model.
Example: Customer Segmentation

Imagine you have data on customer purchases at a retail store, including:

 Customer ID
 Product purchased
 Amount spent
 Demographics (age, location)

You can use clustering to group customers based on their buying habits. For instance,
one cluster might represent customers who frequently buy electronics, another might
be for those who purchase groceries regularly. This information can be used for
targeted marketing campaigns or product recommendations.

Benefits of using Clustering for Preprocessing


 Improved Model Performance: By reducing dimensions and identifying
relevant subgroups, clustering can lead to more accurate and efficient machine
learning models.
 Data Understanding: Clustering can reveal hidden patterns and relationships
within your data, providing valuable insights for further analysis.
Things to Consider
 Choosing the Right Clustering Algorithm: Different clustering algorithms
work better for various data types and purposes. K-means is a popular choice
for numerical data, while hierarchical clustering is suitable for exploring
hierarchical relationships.
 Data Quality: Clustering works best with clean and preprocessed data.
Outliers and inconsistencies can affect the clustering results.

By effectively using clustering for data preprocessing, you can prepare your data for
machine learning tasks, leading to better model performance and valuable insights.

Using Clustering for Semi-Supervised Learning


clustering can be a powerful tool in the realm of semi-supervised learning. Here's how
it works along with some examples to illustrate the concept:

What is Semi-Supervised Learning?

Regular supervised learning relies on data with labelled examples. Semi-supervised


learning, however, leverages the power of both labelled and unlabelled data. This is
beneficial because labelling data can be expensive and time-consuming. By
incorporating unlabelled data along with the precious labelled data points, semi-
supervised learning algorithms can enhance the model's performance.

How Clustering Helps

Clustering algorithms group data points together based on their similarities. In semi-
supervised learning, clustering can be used in two main ways:

1. Cluster-then-Label Approach:
o Here, clustering is used to identify inherent structures within the
unlabelled data. The data is divided into clusters that are likely to
represent different classes.
o Example: Imagine classifying handwritten digits (0-9). We have a small
set of labelled digits and a large set of unlabelled ones. Clustering can
group the unlabelled digits based on their shape and features, creating
clusters that likely correspond to specific digits.
o Once the data is clustered, a supervised learning model can be used to
analyse the labelled data and assign class labels (0-9) to each cluster
based on the representative points within the cluster.
2. Self-Supervised Clustering for Representation Learning:
o This approach utilizes clustering techniques to learn meaningful
representations for the data, even with limited labelled data.
o Imagine training a model to classify different dog breeds. We can use a
self-supervised clustering method to group unlabelled dog images based
on visual similarities. This step helps the model learn features that
differentiate dog breeds without explicit labels for each breed.
o Then, with the learned representations, a supervised model can be
trained on the labelled data to classify specific dog breeds.
Benefits of using Clustering:

 Improved Efficiency: By leveraging unlabelled data, clustering in semi-


supervised learning helps us learn from a larger pool of information, potentially
improving model performance.
 Reduced Labelling Costs: Labelling data can be a bottleneck. Semi-
supervised learning with clustering reduces our dependence on extensive
labelled data.

Remember:

 The effectiveness of clustering in semi-supervised learning depends on the


quality of the clustering itself and the inherent structure within the data.
 Different clustering algorithms may be suitable depending on the specific
problem and data characteristics.
UNIT-V
NEURAL NETWORKS: Neural networks are a class of machine learning models inspired by
the structure and functioning of the human brain. They consist of interconnected nodes
(neurons) organized into layers, typically including an input layer, one or more hidden layers,
and an output layer. Each connection between neurons has an associated weight that adjusts
during training to optimize the network's performance. Backpropagation, a process where
errors propagate back through the network to update the weights, enables neural networks to
learn complex patterns and relationships from data. We use them for tasks like image and
speech recognition, natural language processing, and data-driven prediction.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed because of
this simulation.
3. The neural network then responds in a new way to the environment
because of the changes in its free parameters.

Working of a Neural Network


Neural networks are complex systems that mimic some features of the functioning
of the human brain. It is composed of an input layer, one or more hidden layers,
and an output layer made up of layers of artificial neurons that are coupled. The
two stages of the basic process are called backpropagation and forward
propagation.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Forward Propagation
• Input Layer: Each feature in the input layer is represented by a node
on the network, which receives input data.
• Weights and Connections: The weight of each neuronal connection
indicates how strong the connection is. Throughout training, these
weights are changed.
• Hidden Layers: Each hidden layer neuron processes inputs by
multiplying them by weights, adding them up, and then passing them
through an activation function. By doing this, non-linearity is introduced,
enabling the network to recognize intricate patterns.
• Output: The result is produced by repeating the process until the
output layer is reached.
Backpropagation
• Loss Calculation: The network’s output is evaluated against the real
goal values, and a loss function is used to compute the difference. For a
regression problem, the Mean Squared Error (MSE) is commonly used
as the cost function.

Loss Function:
• Gradient Descent: Gradient descent is then used by the network to
reduce the loss. To lower the inaccuracy, weights are changed based on
the derivative of the loss with respect to each weight.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Adjusting weights: The weights are adjusted at each connection by
applying this iterative process, or backpropagation, backward across the
network.
• Training: During training with different data samples, the entire
process of forward propagation, loss calculation, and backpropagation is
done iteratively, enabling the network to adapt and learn patterns from
the data.
• Actvation Functions: Model non-linearity is introduced by activation
functions like the rectified linear unit (ReLU) or sigmoid. Their decision
on whether to “fire” a neuron is based on the whole weighted input.
Applications:
Neural networks find extensive applications across various domains.
1. Image and Speech Recognition: They power state-of-the-art systems for image
classification, object detection, speech recognition, and natural language
understanding.
2. Natural Language Processing: Machine translation, sentiment analysis, text
generation, and chatbots all use neural networks.
3. Financial forecasting, risk assessment, recommendation systems, and predictive
maintenance all use predictive analytics.
4. Healthcare: Neural networks contribute to disease diagnosis, medical image analysis,
drug discovery, and personalized medicine.
Example:
An illustrative example of neural network application is autonomous driving. Here, we use
neural networks for real-time object detection from camera feeds, lane detection, decision-
making based on sensor inputs, and predictive modeling for trajectory planning. By training on
large datasets of driving scenarios, neural networks can learn to navigate complex
environments, making them pivotal in the development of self-driving technology.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
DEEP LEARNING: Deep learning is a subfield of machine learning that focuses on the
development and application of neural networks with multiple layers (hence "deep"), enabling
the learning of intricate patterns and representations from large amounts of data. Unlike
traditional machine learning algorithms, deep learning architectures automatically learn
hierarchical representations of data through successive layers of abstraction. This approach
allows deep learning models to effectively handle complex tasks such as image recognition,
speech recognition, natural language processing, and more, often achieving state-of-the-art
performance. Deep learning has revolutionized various industries by driving advancements in
artificial intelligence, enabling systems to autonomously learn and adapt from vast and diverse
datasets without explicitly programming task-specific rules.

Working process:
Deep learning's working process involves several key steps that enable neural networks with
multiple layers to learn complex patterns and representations from data:
• Data Collection and Preparation: To train deep learning models, large amounts of
labeled or unlabeled data are required. Data collection involves gathering relevant
datasets and preprocessing them to ensure they are suitable for training.
• Model Architecture Design: The next step involves designing the deep learning model's
architecture. This includes determining the number of layers, the type of layers (e.g.,
convolutional, or recurrent), the activation functions, and other hyperparameters.
• Forward Propagation: The neural network passes input data through its layers during
the training process, a process known as forward propagation. Each layer performs
computations (e.g., matrix multiplications, applying activation functions) to transform
the input data into meaningful representations.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Loss Calculation: After propagating the data through the network and making a
prediction, a loss function (e.g., mean squared error for regression, cross-entropy for
classification) compares the output to the ground truth (actual) values. The loss function
quantifies how far off the predictions are from the true values.
• Deep learning models use backpropagation as a key algorithm for training. It entails
calculating the loss function's gradient with respect to each parameter (weight and bias)
in the network. This gradient guides the adjustment of parameters to minimize the loss.
• Gradient Descent Optimization: An optimization algorithm, such as stochastic gradient
descent or Adam, uses the gradients computed during backpropagation to update the
neural network's parameters in a direction that minimizes the loss function. This
iterative process helps the model learn the optimal parameters for making accurate
predictions.
• Iterative Training: Multiple epochs (passes through the entire dataset) typically perform
the training process iteratively. Each epoch consists of forward propagation, loss
calculation, backpropagation, and parameter updates. The goal is to minimize the loss
on the training data while avoiding overfitting on the validation data.
• Model Evaluation and Testing: After training, we assess the model's performance on
unseen data using a separate test dataset. We use performance metrics like accuracy,
precision, recall, or F1-score to assess the model's ability to generalize to new data.
• Deployment and Inference: Following successful training and evaluation, we can
deploy the trained deep learning model for inference on fresh data. In deployment, the
model takes input data, performs forward propagation, and generates predictions or
classifications based on the learned patterns and representations.
Applications:
• Image Classification and Object Detection: Deep learning, especially convolutional
neural networks (CNNs), is used to identify and classify objects in images, vital for
tasks like autonomous driving and medical imaging.
• Natural Language Processing (NLP): Deep learning powers language translation,
sentiment analysis, and virtual assistants, enhancing communication and understanding
across languages.
• Speech Recognition and Synthesis: Deep learning enables accurate speech
recognition (e.g., in virtual assistants) and natural-sounding text-to-speech synthesis for
human-like voices.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Healthcare Imaging and Diagnosis: Deep learning automates medical image analysis,
aiding in diagnosing diseases from X-rays, MRIs, and other scans with high accuracy.
• Autonomous Driving and Robotics: Deep learning algorithms process sensor data to
enable autonomous vehicles to perceive and navigate environments and empower
robots with object recognition and manipulation capabilities.
Difference between Machine Learning, Deep Learning and Neural networks
Aspect Machine Learning Deep Learning Neural Networks
Subset of AI; Building blocks of
Subset of Machine
Algorithms learn deep learning;
Learning; Uses deep
Definition patterns from data Comprised of
neural networks with
without explicit interconnected nodes
multiple layers.
programming. (neurons).
Can include various Focuses on deep neural Building blocks used
Complexity of
algorithms like decision networks with multiple in deep learning
Models
trees, SVM, k-NN, etc. hidden layers. architectures.
Learns features and Processes information
Automatically learns
Representation patterns from data through weighted
hierarchical
Learning through statistical connections and
representations of data.
methods. activation functions.
Typically uses large
Supervised, Trained via
labeled datasets for
unsupervised, or semi- backpropagation;
Training supervised learning;
supervised learning Adjusts weights to
Process Utilizes
using labeled or minimize prediction
backpropagation for
unlabeled data. errors.
training.
Automatically learns
Requires manual feature features from raw data, Processes raw input
Feature
extraction and selection reducing the need for data through layers to
Engineering
in some cases. manual feature extract features.
engineering.
Used in deep learning
Image recognition,
Image classification, applications like
speech recognition,
Application regression, clustering, convolutional neural
natural language
Examples reinforcement learning, networks (CNNs) and
processing, autonomous
etc. recurrent neural
driving, etc.
networks (RNNs).

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS WITH KERAS
Artificial neural networks (ANNs) are computational models inspired by the structure and
functioning of the human brain. They consist of interconnected nodes (neurons) organized into
layers, with each connection between neurons having an associated weight. ANNs are powerful
tools for learning complex patterns and making predictions from data.
Keras is a high-level neural network API written in Python that allows for easy and fast
experimentation with deep learning models. It provides a user-friendly interface to build, train,
and deploy neural networks, making it popular among both beginners and experienced
researchers.

1. Neurons and Layers:


o Neurons are the basic units of ANNs, which receive inputs, apply a
transformation (activation function), and produce an output.
o Layers in ANNs organize neurons into functional groups. Common types of
layers include:
▪ Input Layer: Receives input data.
▪ Hidden Layers: Intermediate layers between the input and output layers
are where most computation occurs.
▪ Output Layer: Produces the network's final output.
2. Weights and Activation Functions:
o Each connection between neurons is associated with a weight, which determines
the strength of the influence between neurons.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
o Activation functions (e.g., sigmoid, ReLU) introduce non-linearity to the
network, enabling it to learn complex patterns.
3. Feedforward and backpropagation:
o Feedforward: the process of passing input data through the network to generate
an output.
o Backpropagation: An algorithm used to train ANNs by iteratively adjusting
weights based on prediction errors, aiming to minimize a loss function.
Building Neural Networks with Keras:
Keras streamlines the process of building a neural network into a concise workflow. First,
import the necessary modules from Keras to define the model architecture. Initialize a
sequential model and sequentially add layers using Dense for fully connected layers, specifying
the number of units (neurons), activation functions, and input dimensions. After defining the
model, compile it by specifying the optimizer, loss function (e.g., 'adam' and
'binary_crossentropy' for binary classification), and metrics (e.g., 'accuracy'). Next, train the
model using the fit method with training data (X_train, y_train), setting the number of epochs,
batch size, and validation split. Finally, evaluate the model's performance using test data
(X_test, y_test) to obtain loss and accuracy metrics, and use the trained model to make
predictions on new data (X_new_data). This streamlined process demonstrates the ease and
efficiency of building and utilizing neural networks with Keras, making deep learning
accessible to both beginners and experienced practitioners.
Import Keras and Define Model:
from keras.models import Sequential from keras.layers import Dense model =
Sequential() # Initialize sequential model model.add(Dense(units=64, activation='relu',
input_dim=10)) # Add input layer model.add(Dense(units=32, activation='relu')) # Add
hidden layer model.add(Dense(units=1, activation='sigmoid')) # Add output layer
Compile the Model:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Train the Model:
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
Evaluate and Use the Model:
loss, accuracy = model.evaluate(X_test, y_test) predictions =
model.predict(X_new_data)
Application:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Image Recognition and Computer Vision: We can develop applications like image
classification, object detection, and facial recognition using ANNs implemented with
Keras. This is vital for tasks in autonomous vehicles, security systems, and medical
imaging.
• Natural Language Processing (NLP): ANNs facilitate sentiment analysis, machine
translation, and chatbot development. Keras-based models can power virtual assistants,
improve language translation services, and automate text summarization.
• Speech Recognition and Synthesis: Leveraging ANNs with Keras enables accurate
speech recognition for voice-controlled devices and systems. Additionally, Keras
facilitates text-to-speech synthesis, generating human-like voices for diverse
applications.
• Healthcare Imaging and Diagnostics: We apply Keras-based ANNs in medical image
analysis to aid in disease diagnosis from X-rays, MRIs, and CT scans. This technology
supports radiologists in making informed clinical decisions.
• Financial Services and Predictive Analytics: Financial forecasting, fraud detection,
and risk assessment employ ANNs developed with Keras. They are essential for tasks
like credit scoring and algorithmic trading.
IMPLEMENTING MLPS WITH KERAS
Multi-Layer perceptron: Multi-Layer perceptron defines the most complicated architecture
of artificial neural networks. It is substantially formed from multiple layers of perceptron.
The diagrammatic representation of multi-layer perceptron learning is as shown below −

MLP networks are usually used for supervised learning format. A typical learning algorithm
for MLP networks is also called back propagation’s algorithm.
Implementation:
To implement Multi-Layer Perceptrons (MLPs) using Keras, you can start by importing the
necessary modules: TensorFlow and Keras. Begin by defining your model using Sequential()
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
and adding layers with Dense(). Specify the input shape for the first layer and the number of
units for subsequent layers, along with activation functions like'relu' or'sigmoid'. After
defining the layers, compile the model using the model. compile() with appropriate loss
function ('binary_crossentropy' for binary classification or 'categorical_crossentropy' for
multi-class classification), optimizer ('adam','sgd', etc.), and metrics (['accuracy']). Then, fit
your model to the training data using the model. fit() is used with the specified number of
epochs and batch size. Evaluate the model using a model. Use the evaluate() function on the
test data to gauge the model's performance. Finally, you can use the trained model to make
predictions using the model. predict(). This structured approach leverages Keras's simplicity
and flexibility for building and training MLPs efficiently.
Implementation Steps with Keras:
1. Importing Libraries: Begin by importing the required libraries—tensorflow and
keras—to utilize Keras's high-level neural network API built on top of TensorFlow.
import tensorflow as tf from tensorflow import keras
2. Defining the Model Architecture: Use the Sequential model class to create a linear
stack of layers. Define the model architecture by adding layers sequentially using Dense
layers.
model = keras.Sequential([ keras.layers.Dense(units=64, activation='relu',
input_shape=(input_size,)), keras.layers.Dense(units=32, activation='relu'),
keras.layers.Dense(units=num_classes, activation='softmax') ])
o Input Layer: The first Dense layer specifies the input shape (input_size) and
applies the ReLU activation function.
o Hidden Layers: Additional Dense layers define the hidden layers with
specified numbers of units (neurons) and activation functions.
o Output Layer: The last Dense layer specifies the number of output classes
(num_classes) and uses the softmax activation function for multi-class
classification.
3. Compiling the Model: Configure the model for training using compile(), where you
specify the loss function, optimizer, and metrics to be used during training.
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
o Loss Function: Use 'categorical_crossentropy' for multi-class classification
tasks or 'binary_crossentropy' for binary classification.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
o Optimizer: Select an optimizer like 'adam', 'sgd', or others to optimize the
model's weights during training.
o Metrics: Specify evaluation metrics such as 'accuracy' to monitor the model's
performance during training and validation.
4. Training the Model: Train the compiled model on training data using fit(), where you
specify the training data, number of epochs, batch size, and validation data.
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
o Epochs: Number of times the model will iterate over the entire training dataset
during training.
o Batch Size: Number of samples used per gradient update.
5. Evaluating the Model: Evaluate the trained model's performance on the test dataset
using evaluate() to compute the loss and any specified metrics.
test_loss, test_accuracy = model.evaluate(X_test, y_test) print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')
6. Making Predictions: Use the trained model to make predictions on new data using
predict(), which returns the predicted output for the input data.
predictions = model.predict(X_new)
Application:
• Image Classification: Recognizing objects or patterns in images.
• Text Classification: categorizing text into different classes.
• Regression Tasks: Predicting continuous values based on input features.
• Anomaly Detection: Identifying unusual patterns or outliers in data.
• Pattern Recognition: Recognizing complex patterns in data.
INSTALLING TENSOR FLOW 2
To install TensorFlow 2, you can use pip, the Python package installer, which is the
recommended method for most users. Here's how you can install TensorFlow 2, depending on
your Python environment:
1. Install TensorFlow 2 using pip (for Python environments):
If you have a Python environment set up, follow these steps:
For the CPU-only version:
pip install tensorflow
For the GPU version (which requires CUDA and cuDNN):
pip install tensorflow-gpu
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
2. Verify TensorFlow Installation:
After installation, you can verify that TensorFlow 2 is installed correctly by importing it in a
Python environment and checking the version:
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Additional Notes:
• Virtual Environments (recommended): It's a good practice to use virtual
environments (e.g., venv, conda) to manage your Python projects. Activate your virtual
environment before installing TensorFlow.
• GPU Installation (optional): If you have an NVIDIA GPU and want to utilize GPU
acceleration, make sure you have installed compatible versions of CUDA and cuDNN
before installing tensorflow-gpu.
• Compatibility: Check TensorFlow's official installation guide for detailed instructions
and compatibility information based on your operating system and Python version.
Example Installation:
Here is an example of how you can install TensorFlow 2 using pip in a terminal (assuming you
have Python and pip installed):
Install TensorFlow CPU version pip install tensorflow # Install the TensorFlow GPU version
(assuming compatible CUDA and cuDNN are installed). pip install tensorflow-gpu
Make sure to replace pip with pip3 if you are using Python 3 and have multiple Python versions
installed.
LOADING AND PREPROCESSING DATA WITH TENSOR FLOW.
Loading Data
Loading data involves reading and importing datasets into your machine learning application.
In TensorFlow, data can be loaded from various sources:
• NumPy Arrays or Tensors: TensorFlow can directly work with NumPy arrays or
TensorFlow tensors. You can create datasets using tf.data.Dataset.from_tensor_slices().
• Files (e.g., CSV, TFRecord): TensorFlow provides utilities to load data from files like
CSV (using tf.data.experimental.make_csv_dataset) or TFRecord (using
tf.data.TFRecordDataset). This is useful for handling large datasets that don't fit into
memory.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Preprocessing Data
Data preprocessing is essential to prepare the data for training and ensure that it's in a suitable
format for machine learning algorithms:
• Normalization: Scaling features to a similar range (e.g., mean normalization or min-
max scaling) helps in improving convergence and performance of machine learning
models.
• Data Augmentation: Commonly used for image data, data augmentation involves
creating new training examples by applying random transformations like rotations,
flips, and shifts. This helps in increasing the diversity of the training data and improving
model generalization.
• Feature Engineering: Transforming raw data into a format that is more suitable for
the model. This may involve encoding categorical variables, handling missing values,
or extracting relevant features.
• Batching and Shuffling: Data is often processed in batches during training to improve
efficiency. Shuffling the data ensures that the model sees different samples in each
epoch and prevents it from memorizing the order of the data.
Building Data Pipelines
Data pipelines in TensorFlow are used to efficiently process and feed data into machine
learning models:
• Iterating over Datasets: TensorFlow datasets (tf.data.Dataset) provide an abstraction
for handling large amounts of data. You can iterate over datasets using methods like for
batch in dataset: to extract batches of data during training.
• Prefetching: Prefetching data allows the model to fetch batches of data in parallel with
model training, improving overall training performance by reducing idle time.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Importance of Data Handling
Effective data loading and preprocessing are crucial for successful machine learning model
training:
• Data Quality: Proper preprocessing ensures that the data is clean, standardized, and
suitable for the chosen machine learning algorithm.
• Model Performance: Well-preprocessed data can significantly impact model
performance, leading to faster convergence and better generalization.
• Scalability: Efficient data pipelines are essential for handling large datasets that cannot
fit into memory, enabling scalable and distributed training.

M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College

You might also like