ML Complete Notes-AIDS
ML Complete Notes-AIDS
M Exam
B20AD3201 PC 3 -- -- 3 30 70 3 Hrs.
MACHINE LEARNING
(For AI&DS)
Course Objectives:
Identify problems that are amenable to solution by ANN methods, and which ML methods
1
may be suited to solving a given problem.
Formalize a given problem in the language/framework of different ANN methods (e.g., as a
2 search problem, as a constraint satisfaction problem, as a planning problem, as a Markov
decision process, etc).
Course Outcomes: At the end of this course, the students will be able to
Knowledge
S.No Outcome
Level
1 Explain the fundamental usage of the concept Machine Learning system K2
2 Demonstrate on various regression, classification techniques K3
3 Analyze the Ensemble Learning Methods K4
Illustrate the Clustering Techniques and Dimensionality Reduction Models
4 K3
in Machine Learning.
Discuss the Neural Network Models and Fundamentals concepts of Deep
5 K2
Learning
SYLLABUS
Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of
Machine Learning Systems, Main Challenges of Machine Learning.
UNIT-I
Statistical Learning: Introduction, Supervised and Unsupervised Learning,
(12Hrs)
Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk Statistics,
Sampling distribution of an estimator, Empirical Risk Minimization.
Page 29 of 66
Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means,
Using Clustering for Image Segmentation, Using Clustering for Pre processing,
UNIT-IV Using Clustering for Semi-Supervised Learning, DBSCAN, Gaussian Mixtures.
(8 Hrs) Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for
Dimensionality Reduction, PCA, Using Scikit-Learn, Randomized PCA, Kernel
PCA.
Text Books:
Hands-On Machine Learning with Scikit-Learn, Keras, and Tensor Flow, 2nd Edition,
1.
19
Data Science and Machine Learning Mathematical and Statistical Methods, Dirk P. Kroese,
2.
Zdravko I. Botev, Thomas Taimre, Radislav Vaisman,25th November 2020
Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Page 30 of 66
UNIT-I
Artificial Intelligence
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are
programmed to think and act like humans. It involves the development of algorithms and computer
programs that can perform tasks that typically require human intelligence such as visual perception,
speech recognition, decision-making, and language translation. AI has the potential to revolutionize
many industries and has a wide range of applications, from virtual personal assistants to self-driving
cars. Before leading to the meaning of artificial intelligence let understand what the meaning of
Intelligence- Intelligence is: The ability to learn and solve problems. This definition is taken from
webster’s Dictionary.
Uses of Artificial Intelligence :
• Healthcare: AI is used for medical diagnosis, drug discovery, and predictive analysis of diseases.
• Finance: AI helps in credit scoring, fraud detection, and financial forecasting.
• Retail: AI is used for product recommendations, price optimization, and supply chain management.
• Manufacturing: AI helps in quality control, predictive maintenance, and production optimization.
• Transportation: AI is used for autonomous vehicles, traffic prediction, and route optimization.
• Customer service: AI-powered chatbots are used for customer support, answering frequently asked
questions, and handling simple requests.
• Security: AI is used for facial recognition, intrusion detection, and cybersecurity threat analysis.
• Marketing: AI is used for targeted advertising, customer segmentation, and sentiment analysis.
• Education: AI is used for personalized learning, adaptive testing, and intelligent tutoring systems.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Machine Learning
✓ Machine Learning is an algorithm. That has ability to learn from past experience.
✓ Machine learning combines data with statistical tools to predict an output. This output is then
used by corporate to makes actionable insights.
✓ Machine learning is closely related to data mining and Bayesian predictive modeling. The
machine receives data as input, use an algorithm to formulate answers.
✓ A typical machine learning tasks are to provide a recommendation. For those who have a
Netflix account, all recommendations of movies or series are based on the user‘s historical data.
✓ Machine learning is also used for a variety of task like fraud detection, predictive
maintenance, portfolio optimization task and so on.
✓ Machine learning is only one functionality and we can use different programs.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Machine Learning Systems
Supervised learning
Type of machine learning in which machine are trained using well labeled training data and machine
predict the output. Labeled data means some input data is already tagged with the correct output.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Supervised learning
Classification
✓ Classification is a supervised learning
✓ Classification is a categorical variable
✓ Help you divide your data into different classes and the algorithm which implements the
classification on a dataset is known as a classifier.
✓ There are two types of classifications
1) Binary classification: if the classification problem has only two possible classes is calledbinary
classification(T/F,Y/N,0,1)
2) Multi class classification: if the classification program has more than two classes iscalled
multi class classification(Movies, Music)
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Classification Algorithms
✓ Knn
✓ Naïve bayes
✓ Decision tree
✓ Logistic regression
✓ Support vector machine
Regression
✓ Regression algorithm is used if there is a relation between dependent and independent
variable or input and output variable is called regression.
✓ Regression it is used for the prediction of continuous variable such as a weather,forecasting,
market trends etc.
✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression
Unsupervised Learning
Unsupervised learning is a type of algorithm that learns patterns from untagged data. It mainly deal
with the unlabelled data Unsupervised learning algorithm allows users to perform more complex
processing task compared to supervised learning.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Clustering
Clustering is a unsupervised learning. There is not any label for each instance of data. Clustering is
alternatively called as grouping Clustering is the task of grouping a set of objects in such a way that
objects in the same group are more similar to each other than tothose in other group.
✓ Exclusive cluster
✓ Overlap cluster
✓ Hierarchical
Reinforcement Learning
Reinforcement learning is an important type of machine learning where an agent learns how to behave
in an environment by performing actions and seeing the results. Reinforcement learning is learning
from mistakes at the beginning stage. Reinforcement learning is a relationship between supervised and
unsupervised learning
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Deep learning
Deep learning is a branch of machine learning which is based on artificial neural networks. It is capable
of learning complex patterns and relationships within data. In deep learning, we don’t need to explicitly
program everything. It has become increasingly popular in recent years due to the advances in
processing power and the availability of large datasets. Because it is based on artificial neural networks
(ANNs) also known as deep neural networks (DNNs). These neural networks are inspired by the
structure and function of the human brain’s biological neurons, and they are designed to learn from
large amounts of data.
Deep Learning is a subfield of Machine Learning that involves the use of neural networks to model and
solve complex problems. Neural networks are modeled after the structure and function of the human
brain and consist of layers of interconnected nodes that process and transform data.
The key characteristic of Deep Learning is the use of deep neural networks, which have multiple layers
of interconnected nodes. These networks can learn complex representations of data by discovering
hierarchical patterns and features in the data. Deep Learning algorithms can automatically learn and
improve from data without the need for manual feature engineering.
Deep Learning has achieved significant success in various fields, including image recognition, natural
language processing, speech recognition, and recommendation systems. Some of the popular Deep
Learning architectures include Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Deep Belief Networks (DBNs).
Training deep neural networks typically requires a large amount of data and computational resources.
However, the availability of cloud computing and the development of specialized hardware, such as
Graphics Processing Units (GPUs), has made it easier to train deep neural networks
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Main Challenges of Machine Learning.
In Machine Learning, there occurs a process of analyzing data for building or training models. It is
just everywhere; from Amazon product recommendations to self-driven cars, it beholds great value
throughout. As per the latest research, the global machine learning market is expected to grow by
43% by 2024. This revolution has enhanced the demand for machine learning professionals to a great
extent. AI and machine learning jobs have observed a significant growth rate of 75% in the past four
years, and the industry is growing continuously. A career in the Machine learning domain offers job
satisfaction, excellent growth, insanely high salary, but it is a complex and challenging process.
Data plays a significant role in the machine learning process. One of the significant issues that
machine learning professionals face is the absence of good quality data. Unclean and noisy data can
make the whole process extremely exhausting. We don’t want our algorithm to make inaccurate or
faulty predictions. Hence the quality of data is essential to enhance the output. Therefore, we need to
ensure that the process of data preprocessing which includes removing outliers, filtering missing
values, and removing unwanted features, is done with the utmost level of perfection.
This process occurs when data is unable to establish an accurate relationship between input and output
variables. It simply means trying to fit in undersized jeans. It signifies the data is too simple to
establish a precise relationship. To overcome this issue:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3. Overfitting of Training Data
Overfitting refers to a machine learning model trained with a massive amount of data that negatively
affect its performance. It is like trying to fit in Oversized jeans. Unfortunately, this is one of the
significant issues faced by machine learning professionals. This means that the algorithm is trained
with noisy and biased data, which will affect its overall performance. Let’s understand this with the
help of an example. Let’s consider a model trained to differentiate between a cat, a rabbit, a dog, and
a tiger. The training data contains 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is
a considerable probability that it will identify the cat as a rabbit. In this example, we had a vast amount
of data, but it was biased; hence the prediction was negatively affected.
The machine learning industry is young and is continuously changing. Rapid hit and trial experiments
are being carried on. The process is transforming, and hence there are high chances of error which
makes the learning complex. It includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, and a lot more. Hence it is a really complicated process
which is another big challenge for Machine learning professionals.
The most important task you need to do in the machine learning process is to train the data to achieve
an accurate output. Less amount training data will produce inaccurate or too biased predictions. Let
us understand this with the help of an example. Consider a machine learning algorithm similar to
training a child. One day you decided to explain to a child how to distinguish between an apple and
a watermelon. You will take an apple and a watermelon and show him the difference between both
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
based on their color, shape, and taste. In this way, soon, he will attain perfection in differentiating
between the two. But on the other hand, a machine-learning algorithm needs a lot of data to
distinguish. For complex problems, it may even require millions of data to be trained. Therefore we
need to ensure that Machine learning algorithms are trained with sufficient amounts of data.
6. Slow Implementation
This is one of the common issues faced by machine learning professionals. The machine learning
models are highly efficient in providing accurate results, but it takes a tremendous amount of time.
Slow programs, data overload, and excessive requirements usually take a lot of time to provide
accurate results. Further, it requires constant monitoring and maintenance to deliver the best output.
So you have found quality data, trained it amazingly, and the predictions are really concise and
accurate. Yay, you have learned how to create a machine learning algorithm!! But wait, there is a
twist; the model may become useless in the future as data grows. The best model of the present may
become inaccurate in the coming Future and require further rearrangement. So you need regular
monitoring and maintenance to keep the algorithm working. This is one of the most exhausting issues
faced by machine learning professionals.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Statistical Learning: Introduction
Statistical learning is a branch of machine learning that focuses on using statistical methods to extract
knowledge from data and build predictive models. It's all about learning from past observations to make
accurate forecasts about the future.
Key concepts:
• Supervised learning: Here, the data comes with labels (e.g., customer bought a shirt, didn't buy a
shirt). You train the model to learn the relationship between features (e.g., age, browsing history) and
labels, enabling it to predict future labels for unseen data.
• Unsupervised learning: No labels? No problem! This method identifies inherent patterns in unlabeled
data. Imagine analyzing customer reviews to uncover hidden segments or group products with similar
features.
• Regularization: Prevents overfitting, where the model memorizes the training data but fails to
generalize to new situations. Think of it as adding training wheels to your model to prevent it from
going too wild.
• Model selection: With various models at your disposal, how do you choose the best one? This
involves comparing their performance on unseen data and picking the champion.
Applications:
Statistical learning is everywhere! From personalized search results and targeted advertising to medical
diagnosis and financial forecasting, it's transforming countless industries. Here are some specific
examples:
• Recommender systems: Suggesting movies you'll love, recommending books you can't put down, and
even predicting what you'll buy next at the grocery store.
• Spam filtering: Keeping your inbox clean by identifying and eliminating unwanted emails.
• Fraud detection: Analyzing financial transactions to catch suspicious activity and protect your hard-
earned money.
• Medical diagnosis: Identifying patterns in medical images and data to help doctors diagnose diseases
more accurately.
• Climate prediction: Analyzing historical data and complex models to forecast future weather patterns
and climate change.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Supervised and Unsupervised Learning
Supervised learning
Type of machine learning in which machine are trained using well labeled training data and machine
predict the output. Labeled data means some input data is already tagged with the correct output.
Classification
✓ Classification is a supervised learning
✓ Classification is a categorical variable
✓ Help you divide your data into different classes and the algorithm which implements the
classification on a dataset is known as a classifier.
✓ There are two types of classifications
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3) Binary classification: if the classification problem has only two possible classes is calledbinary
classification(T/F,Y/N,0,1)
4) Multi class classification: if the classification program has more than two classes iscalled
multi class classification(Movies, Music)
✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Unsupervised Learning
Unsupervised learning is a type of algorithm that learns patterns from untagged data.
It mainly deal with the unlabelled data
Unsupervised learning algorithm allows users to perform more complex processing task
compared to supervised learning.
Clustering
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of clustering algorithms
Exclusive cluster
Overlap cluster
Hierarchical
Exclusive (partitioning)
In this clustering method, Data are grouped in such a way that one data can belong to one clusteronly.
Example: K-means
Agglomerative
In this clustering technique, every data is a cluster. The iterative unions between the two nearest
clusters reduce the number of clusters.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Overlapping
In this technique, fuzzy sets are used to cluster data. Each point may belong to two or more
clusters with separate degrees of membership.
Test Loss:
• Now, imagine taking your trained dog to a park with lots of distractions – squirrels, frisbees, other
dogs. Will it still fetch your ball? The test loss is like taking your model to this "unseen" park (test
data). It assesses how well your model performs on data it hasn't seen before.
• Ideally, the test loss should be similar to, or even lower than, the training loss. This suggests that
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
your model hasn't just memorized the training data, but has truly learned the underlying patterns
and can generalize well to new situations.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Model Complexity vs. Generalizability:
• Model complexity: More complex models with a larger number of parameters can potentially
capture more complex patterns in the data.
• Generalizability: However, complex models are more prone to overfitting and may not generalize
well to unseen data.
• Trade-off: There is a trade-off between model complexity and generalizability. The ideal model
should be sufficiently complex to capture the relevant patterns in the data but not so complex that it
overfits and loses its ability to generalize.
3. Regularization vs. Flexibility:
• Regularization: This is a technique used to penalize complex models and encourage simpler
models that are less prone to overfitting.
• Flexibility: Regularization can also reduce the model's ability to capture complex patterns in the
data.
• Trade-off: There is a trade-off between regularization and flexibility. The chosen regularization
strength should be balanced to achieve the desired level of bias-variance trade-off and minimize
overfitting.
4. Computational Efficiency vs. Accuracy:
• Computational efficiency: Some learning algorithms are computationally expensive and may
require significant resources to train.
• Accuracy: More complex and sophisticated algorithms might achieve higher accuracy but at the
cost of increased computational demands.
• Trade-off: Depending on the available resources and the specific task, there may be a trade-off
between computational efficiency and accuracy. Sometimes, a simpler model with slightly lower
accuracy might be more practical due to its computational efficiency.
5. Data Quantity vs. Model Performance:
• Data quantity: More data can potentially lead to better model performance as the learning
algorithm has more information to learn from.
• Limited data: However, acquiring and managing large amounts of data can be expensive and time-
consuming.
• Trade-off: There is a trade-off between data quantity and model performance. In some
cases, techniques like data augmentation or transfer learning can be used to mitigate the need for
large datasets.
6. Interpretability vs. Black Box Models:
• Interpretability: Some models are more easily interpretable than others, meaning it is easier to
understand how they reach their predictions.
• Black box models: Highly complex models can be difficult to interpret, making it challenging to
understand their decision-making process.
• Trade-off: There is a trade-off between interpretability and model performance. While
interpretable models offer more transparency, they might not always achieve the best
performance. The choice between interpretability and performance depends on the specific
application and the importance of understanding the model's rationale.
ESTIMATING RISK STATISTICS
Estimating risk statistics in machine learning involves calculating various metrics that provide
insights into the performance of a model and its ability to generalize to unseen data. These metrics
are crucial for evaluating the effectiveness and robustness of a model, allowing researchers and
practitioners to make informed decisions about model selection, training procedures, and
deployment.
Here are some key risk statistics commonly used in machine learning:
1. Generalization Error:
• This measures the average error of a model on unseen data.
• It is the true risk we ultimately want to minimize, but it can never be directly observed.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Common estimators for generalization error include:
Test error: The average error on a held-out test set.
Cross-validation error: The average error across multiple rounds of model training and evaluation
using different data splits.
2. Bias and Variance:
• Bias: This is the systematic difference between the average prediction of a model and the true
target value.
• Variance: This measures the variability of the model's predictions across different samples of data.
• The ideal model would have both low bias and low variance.
• Bias can be estimated using techniques like cross-validation or comparing to a benchmark model.
• Variance can be estimated by looking at the variation in predictions across different data splits or
by using resampling methods like bootstrapping.
3. Loss Function:
• This function measures the error between the model's predictions and the true target values.
• Different loss functions are used for different types of tasks (e.g., mean squared error for
regression, cross-entropy for classification).
• The average loss on a held-out test set or cross-validation folds provides an estimate of the
generalization error.
4. Confidence Intervals:
• These intervals provide a range within which the true value of a statistic (e.g., generalization error)
is likely to lie with a certain degree of confidence.
• Confidence intervals can be calculated using various methods, including bootstrapping and
asymptotic approximations.
5. AUC (Area Under the ROC Curve):
• This metric is commonly used for evaluating binary classification models.
• It measures the ability of the model to distinguish between positive and negative examples.
• Higher AUC values indicate better performance.
6. Precision and Recall:
• These metrics are also used for evaluating binary classification models.
• Precision measures the proportion of positive predictions that are actually correct.
• Recall measures the proportion of actual positive examples that the model correctly classifies.
• Depending on the specific task, one metric might be more important than the other.
7. Calibration:
• This refers to how well the model's predicted probabilities correspond to the actual class
probabilities.
• A well-calibrated model's predictions can be used to accurately estimate the true probabilities of
events.
• Calibration curves and calibration error metrics can be used to assess calibration.
8. Stability:
• This measures how sensitive the model's predictions are to small changes in the data.
• A stable model is less likely to be affected by noise and outliers in the data.
• Stability can be assessed by analyzing the model's performance on perturbed versions of the data
or by comparing its predictions across different data splits.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Sampling distribution of an estimator
In machine learning, we often use estimators to approximate unknown population parameters based
on a sample of data. However, these estimators are not guaranteed to be exactly equal to the true
parameter value. Instead, they will vary depending on the specific sample we choose. The sampling
distribution of an estimator describes the probability distribution of all possible values the estimator
can take across different random samples of the same size from the population.
Example:
• Mean: The sampling distribution of the sample mean is approximately normal for large samples, even
if the population distribution is non-normal. This is due to the Central Limit Theorem.
• Proportion: The sampling distribution of the sample proportion can be approximated by a binomial
distribution, especially for large samples with moderate success probabilities.
• Model parameters: The sampling distribution of the estimated parameters of a machine learning model
(e.g., regression coefficients in linear regression) will depend on the specific model and the estimation
method used.
EMPIRICAL RISK MINIMIZATION.
The Empirical Risk Minimization (ERM) principle is a learning paradigm which consists in
selecting the model with minimal average error over the training set. This so-called training error
can be seen as an estimate of the risk (due to the law of large numbers), hence the alternative name
of empirical risk.
By minimizing the empirical risk, we hope to obtain a model with a low value of the risk. The larger
the training set size is, the closer to the true risk the empirical risk is.
If we were to apply the ERM principle without more care, we would end up learning by heart, which
we know is bad. This issue is more generally related to the overfitting phenomenon, which can be
avoided by restricting the space of possible models when searching for the one with minimal error.
The most severe and yet common restriction is encountered in the contexts of linear
classification or linear regression. Another approach consists in controlling the complexity of the
model by regularization.
Example:
Data: We have a dataset of 100 points, each represented by a feature (x) and a target value (y). We
want to find a linear function (model) that best fits this data.
Loss Function: We choose the squared error loss function, which measures the average squared
difference between the predicted and actual target values.
ERM Process:
Start with a family of models: In linear regression, this family consists of all possible linear functions
of the form y = mx + b, where m and b are the slope and intercept, respectively.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
For each model in the family:
Calculate the predicted target value for each data point using the model's function.
Calculate the squared error for each data point.
Calculate the average squared error over all data points (this is the empirical risk of the model).
Choose the model with the smallest empirical risk: This model is our best estimate of the true
underlying relationship between x and y.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
UNIT-II
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Types of Supervised learning
Classification:
• Classification is a supervised learning technique in machine learning that deals with the
categorization of data into predefined classes or labels. It involves training a model on a dataset
with known categories and then using that trained model to predict the class of new, unseen
instances.
• Classification is crucial for scenarios where the goal is to assign data points to specific categories
or classes. It is widely used in various applications, such as spam filtering, sentiment analysis, image
recognition, and medical diagnosis. The primary objective is to build a model that can make accurate
predictions on new data based on patterns learned from the training data.
• Classification finds applications in diverse fields. For instance, in finance, it's used for credit
scoring; in healthcare, it aids in disease diagnosis; in image recognition, it identifies objects in
images; and in natural language processing, it classifies text sentiments. The versatility of
classification makes it applicable in numerous domains.
• During the model development phase, the algorithm is trained on a labeled dataset to perform
classification. The trained model can classify new instances. When the task involves sorting data
into distinct classes or categories based on certain features, the model is employed.
• In classification, the algorithm learns patterns from labelled training data to make predictions on
new, unseen data. Various classification algorithms, such as logistic regression, decision trees,
support vector machines, and neural networks, implement different strategies to identify decision
boundaries and classify data points into distinct classes.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
1. Binary Classification: This type of classification involves two possible classes, such as true/false,
yes/no, or 0/1. Examples include spam detection, fraud detection, and medical diagnosis, where the
outcome is binary.
2. Multi-class Classification: In multi-class classification, the task involves more than two classes. For
example, classifying emails into categories like "work," "personal," or "promotions" Multi-class
classification is prevalent in scenarios with multiple possible outcomes.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Regression finds applications in various fields, including economics, finance, biology, and engineering. In
economics, regression enables the prediction of the impact of factors such as inflation and interest rates on the
gross domestic product (GDP). Regression can be used in healthcare to predict patient outcomes based on
various medical parameters.
• Regression models are used when variables are expected to have a continuous relationship, resulting in
numerical output. During the model development phase, the algorithm applies regression to train on historical
data and learn the patterns and relationships between variables. Once trained, the model can make predictions
based on new data.
• In regression, the algorithm fits a mathematical model to the data, typically a straight line or a curve, that
represents the relationship between the independent and dependent variables. During prediction, the model
utilizes the learned relationship to predict the continuous output variable, after being trained on a labeled dataset.
Types of Regression Algorithms
✓ Linear regression
✓ Logistic Regression
✓ Polynomial Regression
BASIC METHODS: DISTANCE BASED METHODS
Distance Based Models
Distance-based models are the second class of Geometric models. Like Linear models, distance-based models
are based on the geometry of data. As the name implies, distance-based models work on the concept of distance.
In the context of Machine learning, the concept of distance is not based on merely the physicaldistance
between two points. Instead, we could think of the distance between two points considering the mode of
transport between two points. Travelling between two cities by plane covers less distance physically than by train
because a plane is unrestricted. Similarly, in chess, the concept of distance depends on the piece used – for
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
example, a Bishop can move diagonally. Thus, depending on the entity and the mode of travel, the concept of
distance can be experienced differently. The distance metrics commonly used are
Euclidean, Minkowski, Manhattan, and Mahalanobis.
Distance is applied through the concept of neighbours and exemplars. Neighbours are points in proximity with
respect to the distance measure expressed through exemplars. Exemplars are either centroids that find a centre
of mass according to a chosen distance metric or medoids that find the most centrally located data point. The most
commonly used centroid is the arithmetic mean, which minimises squared Euclidean distance to all other points.
NEAREST NEIGHBOURS:
K-Nearest Neighbor Algorithm (K-NN)
• K-Nearest Neighbours (K-NN) is a simple and versatile machine learning algorithm based on supervised
learning. However, it is most commonly used for classification tasks, although it can also be applied to
both regression and classification problems. K-NN is a non-parametric, instance-based learning
algorithm that makes predictions based on the similarity of a new data point to its k-nearest neighbours
in the training dataset.
• When the task involves making predictions based on the similarity of data points, K-NN is chosen. It is
particularly useful when the underlying structure of the data is complex and not easily captured by a
mathematical model. K-NN is robust and does not assume any specific distribution of the data, making
it suitable for various types of datasets.
• K-NN finds applications in a wide range of fields, including image recognition, recommendation
systems, medical diagnosis, and pattern recognition. In image recognition, K-NN identifies the class of
a new image by comparing its similarity to previously labeled images.
• During the model development phase, the algorithm is trained on a labeled dataset using K-NN. When
applying K-NN, the prediction task involves finding the class or value of a new data point based on the
majority class or average of its k-nearest neighbors.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• In K-NN, the algorithm classifies a new data point by examining the k-nearest neighbours in the training
dataset. Distance metrics, such as Euclidean distance or Manhattan distance, typically measure the
similarity between data points. The class or value of the new data point is determined by a majority vote
or by averaging the values of its k-nearest neighbours.
Euclidean Distance:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Application
• Used in classification
• Used in get missing values
• Used in pattern recognition
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
DECISION TREES
✓ Decision Tree is a supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
✓ In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
✓ The decisions or the test are performed on the basis of features of the given dataset.
✓ It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions.
✓ It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
✓ In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
✓ A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into sub trees.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Algorithm
1. Start with a training data set which we‘ll call S. It should have attributes and classification.
2. Determine the best attribute in the dataset. (We will go over the definition of best attribute)
3. Split S into subset that contains the possible values for the best attribute.
5. Recursively generate new decision trees by using the subset of data created from step 3 until a
stage is reached where you cannot classify the data further. Represent the class as leaf node.
Example
Formulas
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Infogain
I(Outlook, Overcast) = 0
I(Outlook, Rain) = 0.971
Total Entropy = 0.694
Gain = infogain – Entropy
= 0.940-0.694 = 0.246
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Outlook Temperature Humidity Wind Play?
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Advantage
• Easy to use and understand.
• Can handle both categorical and numerical data.
• Resistant to outliers, hence require little data preprocessing.
Disadvantages
• The decision tree contains lots of layers, which makes it complex.
• It may have an over fitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase.
Application
Decision tree has been used to develop models for prediction and classification in different domains some of
which are
• Business management
• Customer relationship management
• Fraudulent statement detection
• Engineering, Energy consumption
• Fault diagnosis
• Healthcare Management
• Agriculture
NAIVE BAYES
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
✓ Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for
solving classification problems.
✓ It is mainly used in text classification that includes a high-dimensional training dataset.
✓ Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in
building the fast machine learning models that can make quick predictions.
✓ It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
✓ Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.
Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is
true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
✓ Working of Naïve Bayes' Classifier can be understood with the help of the below example:
✓ Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
✓ Convert the given dataset into frequency tables.
✓ Generate Likelihood table by finding the probabilities of given features.
✓ Now, use Bayes theorem to calculate the posterior probability.
✓ Problem: If the weather is sunny, then the Player should play or not?
✓ Solution: To solve this, first consider the below dataset:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Frequency table for the Weather Conditions
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Applying Bayes'theorem
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes) / P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Applications:
✓ It is used for Credit Scoring.
✓ It is used in medical data classification.
✓ It is used in Text classification such as Spam filtering and Sentiment analysis.
LINEAR MODELS
Linear Regression
• Linear regression is simple and easy algorithm
• Linear regression is a statistical approach is used for predictive analysis
• Linear regression to solve regression problems
• Linear regression is a continuous variable
• Relationship between dependent variable and Independent variable
• Either positive or negative regression BEST Fit Line – Straight Line
Y=b0+b1*X
Where:
Y= Dependent variable X=
Independent variable
B0=Intercept
B1=coefficient of relationship between X&Y
Linear Regression Line
1. Positive Regression: If the dependent variable increase on the Y-axis and independent variable on X-
axis thensuch a relationship is termed as a POSITIVE Regression
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Y=b0+b1*X
2. Negative Regression: If the dependent variable decrease on the Y-axis and independent variable increase
on theX-axis then such relationship is called a negative regression
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Y= - b0+b1*X
Example: Linear Regression Using Least square method
Independe Dependen
ntvariable tVariable ̅
𝑿−𝑿 ̅
𝒀−𝒀 ̅ )𝟐
(𝑿 − 𝑿 ̅ )*(𝒀 − 𝒀
(𝑿 − 𝑿 ̅)
X Y
1 2 1-3=-2 2-4=-2 4 4
2 4 2-3=-1 4-4=0 1 0
3 5 3-3=0 5-4=1 0 0
4 4 4-3=1 4-4=0 1 0
5 5 5-3=2 5-4=1 4 2
3 4 10 6
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
LOGISTIC REGRESSION: Logistic regression is a statistical method commonly employed
in machine learning and statistics for binary classification tasks with a categorical outcome
having two possible classes (e.g., 0 or 1, yes or no, true or false). It's a type of regression
analysis commonly employed in machine learning and statistics.
Unlike linear regression, which predicts continuous outcomes, logistic regression models the
probability that a given input belongs to a particular class. It does this by applying a logistic
(or sigmoid) function to a linear combination of the input features. The logistic function
constrains the output of the regression model between 0 and 1, representing probabilities.
The logistic regression model can be mathematically represented as:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Where:
• P(Y=1∣X) is the probability of the target variable being 1 given the input features X.
• e is the base of the natural logarithm.
• 0, 1, 2,..., n are the model's coefficients learned during training.
• The input features are 1, 2,..., X1, X2,..., and Xn.
During training, the model learns the optimal values for the coefficients (weights) that
minimise a chosen loss function, typically the log loss or cross-entropy loss. These coefficients
determine the relationship between the input features and the log-odds of the target variable.
Various fields like healthcare, finance, marketing, and social sciences widely use logistic
regression for tasks such as predicting whether an email is spam or not, diagnosing diseases,
determining customer churn, etc. It's also a fundamental building block for more complex
machine learning algorithms and techniques.
Example:
import numpy as np
y = np.array([0, 0, 0, 1, 1, 1, 1, 1])
model = LogisticRegression()
model.fit(X, y)
predicted_probs = model.predict_proba(new_X)[:, 1]
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
plt.xlabel('Feature')
plt.title('Logistic Regression')
plt.legend()
plt.grid(True)
plt.show()
Output:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Example:
import numpy as np
import statsmodels.api as sm
# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 3, 5]) # Number of events
# Fit Poisson regression model
poisson_model = sm.GLM(y, sm.add_constant(x), family=sm.families.Poisson()).fit()
# Print model summary
print(poisson_model.summary())
Output:
MNIST: The MNIST dataset is a widely used benchmark dataset in the fields of machine
learning and computer vision. It stands for "Modified National Institute of Standards and
Technology" and consists of a large collection of grayscale images of handwritten digits from
0 to 9. Each image is 28 pixels in height and 28 pixels in width, resulting in a total of 784 pixels
per image. Researchers commonly use the MNIST dataset to train and test algorithms in image
classification, especially for tasks involving handwritten digit recognition. It has become a
standard dataset for evaluating the performance of various machine learning algorithms,
including neural networks, support vector machines, decision trees, and more. It divides the
dataset into two main parts: a training set and a test set. The training set contains 60,000 images,
while the test set contains 10,000 images. Each image includes a label indicating the digit it
represents (0 through 9). Due to its simplicity, standardisation, and accessibility, the MNIST
dataset has played a crucial role in advancing research and development in the field of machine
learning, particularly in the early stages of deep learning and convolutional neural networks
(CNNs). It has also served as a benchmark for comparing the performance of different
algorithms and techniques.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
RANKING: Ranking in machine learning involves the task of organising a set of items based
on their relevance or importance within a given context, which is crucial for applications like
information retrieval and recommendation systems. This task encompasses approaches such as
pointwise, pairwise, and listwise ranking, each addressing different aspects of the ranking
problem. Pointwise ranking treats each item independently, predicting its relevance score,
while pairwise ranking aims to learn the preference between pairs of items. Listwise ranking,
on the other hand, considers the entire list of items as a single instance and directly optimises
the ranking of the entire list. Learning to Rank (LTR) frameworks encompass these approaches,
training models on labelled data to rank items for new queries, thereby facilitating tasks like
search engine optimisation and personalised recommendations.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
UNIT-III
Ensemble Learning:
Ensemble methods in machine learning combine the insights obtained from multiple
learning models to facilitate accurate and improved decisions. In learning models, noise,
variance, and bias are the major sources of error. The ensemble methods in machine learning
help minimise these error-causing factors, thereby ensuring the accuracy and stability of
machine learning (ML) algorithms.
Ensemble learning works by training multiple base learners on the same dataset, but using
different algorithms or subsets of the data. These base learners could be decision trees, neural
networks, support vector machines, or any other machine learning algorithm. After training the
base learners, they combine their predictions in some way to produce the final prediction.
In ensemble learning, there are several methods for combining base learners' predictions,
including:
1. Each base learner makes a prediction, and the final prediction is determined by a
majority vote (for classification tasks) or averaging (for regression tasks) of the
individual predictions.
2. Each base learner's prediction is weighted based on its performance on a validation set
or another criterion, and then these weighted predictions are combined to create the
final prediction.
3. In stacking, the meta-learner is trained using the predictions of the base learners to
make the final prediction. This allows the meta-learner to learn how to best combine
the predictions of the base learners.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Ensemble methods can be divided into two groups:
• Sequential ensemble methods where the base learners are generated sequentially (e.g.
AdaBoost). The basic motivation of sequential methods is to exploit the dependence
between the base learners. The overall performance can be boosted by weighing
previously mislabeled examples with higher weight.
• Parallel ensemble methods where the base learners are generated in parallel (e.g.
Random Forest). The basic motivation of parallel methods is to exploit independence
between the base learners since the error can be reduced dramatically by averaging.
Figure: Bagging
Advantages
• Efficient on large datasets
• More accurate than decision trees
• Averaging results of many trees reduces variance
Disadvantages
• More difficult to interpret than decision trees
• Less clear which variable are of greatest importance for predicting the response
• More computationally intensive than forming a single decision tree
Applications:
1. Classification and Regression Tasks:
o Example: predicting customer churn in a telecom company. Bagging with
decision trees enables training multiple models to predict customer churn based
on features like usage patterns, customer demographics, and service
subscription details.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
2. Medical Diagnosis:
o Example: predicting the likelihood of a patient having a particular disease based
on symptoms, medical history, and test results. Bagging with decision trees can
help create an ensemble model that combines predictions from individual
decision trees trained on various patient data subsets, enhancing diagnostic
accuracy.
3. Finance and Risk Management:
o Example: credit risk assessment for loan approval. Bagging with decision trees
can help create a strong predictive model that assesses the creditworthiness of
loan applicants based on their financial history, employment status, and other
relevant factors. This can help financial institutions make informed decisions
about loan approvals while managing risk effectively.
4. Marketing and customer segmentation:
o Example: segmenting customers based on their purchasing behaviour and
preferences. Bagging with decision trees can help identify distinct customer
segments and tailor marketing strategies accordingly. For instance, an e-
commerce company can use this approach to personalise product
recommendations and promotional offers for different customer segments,
thereby improving customer engagement and sales.
5. Image and speech recognition:
o Example: Handwritten digit recognition in optical character recognition (OCR)
systems. Bagging with decision trees can be utilized to develop an ensemble
model that accurately classifies handwritten digits by analyzing pixel intensities
and spatial features. This can be beneficial for digitising documents and
automating data entry tasks.
6. Environmental Monitoring:
o Example: predicting air quality levels based on meteorological data, pollution
levels, and geographic factors. Bagging with decision trees can be employed to
build a predictive model that forecasts air quality indices, helping local
authorities and environmental agencies take proactive measures to mitigate
pollution and protect public health.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Random forests
Random forest is a supervised learning algorithm which is used for both classification as well
as regression. But however, it is mainly used for classification problems. As we know that a
forest is made up of trees and more trees means more robust forest. Similarly, random forest
algorithm creates decision trees on data samples and then gets the prediction from each of
them and finally selects the best solution by means of voting. It is an ensemble method which
is better than a single decision tree because it reduces the over-fitting by averaging the result.
We can understand the working of Random Forest algorithm with the help of following steps
• Step 1 − First, start with the selection of random samples from a given dataset.
• Step 2 − Next, this algorithm will construct a decision tree for every sample. Then it
will get the prediction result from every decision tree.
• Step 3 − In this step, voting will be performed for every predicted result.
• Step 4 − At last, select the most voted prediction result as the final prediction result.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Advantages
1. No interpretability
2. Overfitting can easily occur
3. Need to choose the number of trees
So basically Random forest is used when you are just looking for high performance with less
need for interpretation.
Application:
1. Medical Diagnosis:
o Example: predicting whether customers are likely to churn (i.e., stop using a
service or cancel a subscription) based on their interactions with a product or
service. Random forests can analyse customer data, such as usage patterns,
feedback, and demographic information, to identify at-risk customers and
develop targeted retention strategies.
4. Image Classification:
5. Ecological Modelling:
o Example: Forecasting future sales and demand for retail products based on
historical sales data, promotional activities, seasonal trends, and economic
factors. Random forests can analyse large volumes of transactional data to
identify patterns and relationships that influence sales performance, enabling
retailers to optimise inventory management and pricing strategies.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Boosting: Boosting is a technique to combine weak learners and convert them into strong
ones with the help of Machine Learning algorithms. It uses ensemble learning to boost the
accuracy of a model. Ensemble learning is a technique to improve the accuracy of Machine
Learning models. There are two types of ensemble learning:
It is a boosting technique where the outputs from individual weak learners associate
sequentially during the training phase. The performance of the model is boosted by assigning
higher weights to the samples that are incorrectly classified. AdaBoost algorithm is an example
of sequential learning that we will learn later in this blog.
Boosting Algorithms
Boosting is creating a generic algorithm by considering the prediction of the majority of weak
learners. It helps in increasing the prediction power of the Machine Learning model. This is done
by training a series of weak models.
Below are the steps that show the mechanism of the boosting algorithm:
1. Reading data
4. Assigning the false prediction, along with a higher weightage, to the next learner
Advantages of Boosting:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
1. Improved Accuracy: Boosting algorithms typically produce highly accurate
predictions by combining multiple weak learners, each focusing on different aspects of
the data.
2. Reduced Bias and Variance: Boosting reduces both bias and variance, leading to
models that generalise well to unseen data and are less prone to overfitting.
3. Feature Importance: Boosting algorithms provide insights into feature importance,
allowing users to identify the most influential predictors in their models.
4. Boosting algorithms' versatility extends to various machine learning tasks, such as
classification, regression, and ranking, making them essential tools in the data scientist's
toolbox.
Boosting has its disadvantages:
1. Boosting algorithms are sensitive to noisy data and outliers, which can negatively affect
model performance if not properly handled.
2. Computationally Intensive: Training boosting models can be computationally
intensive, especially when dealing with large datasets or complex models with many
iterations.
3. Boosting algorithms are less prone to overfitting compared to individual weak
learners, but they can still overfit if the model complexity is not properly controlled.
4. Interpretability: Boosting models can be challenging to interpret due to their ensemble
nature and the complexity of the underlying algorithms.
Examples and applications:
1. Face Recognition:
o Boosting algorithms can help detect and recognize faces in images or videos by
combining multiple weak classifiers for face recognition tasks.
o AdaBoost can build a face recognition system that combines simple facial
features (e.g., eyes, nose, mouth) to identify individuals in photos or
surveillance footage.
2. Customer Churn Prediction:
o Boosting algorithms commonly identify customers likely to leave a service or
product subscription in customer churn prediction.
o GBM can analyze customer data (e.g., usage patterns, demographics,
satisfaction scores) and predict churn, enabling companies to proactively
address customer retention.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
3. Credit Risk Assessment:
o Boosting algorithms in credit risk assessment evaluate the likelihood of default
or delinquency for loan applicants.
o XGBoost can analyze financial data (e.g., credit history, income, debt-to-
income ratio) and classify loan applicants into low, medium, or high-risk
categories, aiding lenders in making informed decisions about loan approvals
and interest rates.
4. Click-through rate (CTR) prediction:
o Boosting algorithms in online advertising predict click-through rates (CTR) and
optimize ad placement strategies.
o LightGBM enables advertisers to target their campaigns more effectively by
analyzing user behavior data (e.g., browsing history, search queries, device
type) and predicting the likelihood of users clicking on specific ads.
Difference Between Bagging, Boosting and Stacking
Bagging Boosting Stacking
Partitioning
Giving misclassified
of the data Random Various
samples higher preference
into subsets
Goal to
Minimize variance increase predictive force Both
achieve
Methods
where this Random Gradient descent Blending
is used
Function to
combine
Weighted average Weighted majority Logistic Regression
single
models
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
VOTING CLASSIFIERS
A voting classifier is a type of ensemble learning method in which multiple base classifiers are
trained on the same dataset, and their individual predictions are combined to make a final
prediction. The idea behind a voting classifier is to aggregate the predictions of multiple
classifiers and use a majority vote (for classification tasks) or averaging (for regression tasks)
to determine the final prediction.
There are two main types of voting classifiers:
1. Hard Voting: In hard voting, each base classifier predicts the class label for a given
input, and the class that receives the most votes is chosen as the final prediction. This
approach works well when the base classifiers are diverse and the class labels are well-
defined.
2. Soft Voting: In soft voting, instead of simply counting the votes for each class label,
the classifiers' predicted probabilities for each class are averaged, and the class with the
highest average probability is chosen as the final prediction. Soft voting tends to be
more effective when the base classifiers can output probability estimates, as it takes
into account the confidence of each classifier's predictions.
Voting classifiers can be constructed using different types of base classifiers, such as decision
trees, support vector machines, logistic regression, or any other classification algorithm. The
key idea is to leverage the diversity of individual classifiers to improve overall prediction
accuracy and generalisation performance.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Voting classifiers are commonly used in practice because they are simple to implement and
often yield robust performance, especially when the base classifiers are diverse and
complementary to each other. They are particularly useful in situations where no single
classifier performs consistently well across all parts of the input space.
Example:
classifier1_prediction = 'AIDS'
classifier2_prediction = 'CSE'
classifier3_prediction = 'AIDS'
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
majority_vote = Counter(votes).most_common(1)[0][0]
# Final prediction
Output:
Majority Vote Prediction: AIDS
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Pasting: Pasting is similar to bagging, but instead of sampling with replacement, it
samples without replacement. This means that each instance of the training data can
only be sampled once for each subset. Pasting is useful when you have a large dataset
and want to avoid repeatedly sampling the same instances.While bagging is typically
more commonly used, pasting can sometimes lead to a slightly lower variance in the
final model because each instance is only used once in each base learner's training
process.
Both bagging and pasting are effective techniques for improving the generalisation
performance of machine learning models, especially when the base learners are unstable
(sensitive to small changes in the training data) or when the dataset is noisy. They are
commonly used in combination with decision trees or other simple classifiers to create more
robust and accurate ensemble models.
STACKING
Stacking, also known as stacked generalisation, is an ensemble learning technique that
combines multiple base models with a meta-model to make predictions. Unlike traditional
ensemble methods like bagging and boosting, where models are combined through averaging
or voting, stacking involves training a meta-model to learn how to best combine the predictions
of the base models.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Base Models: Trainers train multiple diverse base models on the same dataset using different
algorithms or subsets of the data. Each base model makes predictions on the same set of
instances.
Meta-Model: The meta-model or blender trains using the predictions generated by the base
models as features. The meta-model learns to combine the predictions of the base models in a
way that optimises predictive performance on a validation set. Typically, the meta-model is a
simple model like linear regression, logistic regression, or a shallow neural network.
Prediction: When making predictions on new data, the base models first make predictions on
the new instances. The meta-model uses these predictions as features to make the final
prediction.
Stacking enables the base models to interact more complexly and capture patterns in the data
that individual models may miss. By training a meta-model to learn how to best combine the
predictions of the base models, stacking can often achieve higher predictive accuracy compared
to any single model alone.
One important consideration in stacking is how to prevent overfitting. To address this, you can
use cross-validation to generate predictions for the meta-model or utilize hold-out sets for
training and validation.
RANDOM FORESTS
Random Forest is a versatile ensemble learning algorithm widely used for both classification
and regression tasks. It constructs multiple decision trees during training by randomly sampling
data with replacement and selecting a subset of features at each node split, introducing diversity
and reducing overfitting. Through a voting mechanism in classification or averaging in
regression, the final prediction is made, leveraging the collective wisdom of the individual
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
trees. Renowned for their robustness, Random Forests excel in various data scenarios,
providing estimates of feature importance and performing well in high-dimensional spaces.
While computationally intensive for large datasets, its simplicity, interpretability, and
effectiveness across diverse applications make it a preferred choice for building reliable
predictive models.
Random Forest works in two-phase first is to create the random forest by combining N decision
tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is
given to the Random forest classifier. The dataset is divided into subsets and given to each
decision tree. During the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results, the Random Forest
classifier predicts the final decision. Consider the below image:
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
o Although random forest can be used for both classification and regression tasks, it is
not more suitable for Regression tasks.
How Linear SVM Classification Works: Linear Support Vector Machine (SVM)
classification is a supervised learning algorithm used for classifying data points into different
classes. Here's how it works:
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Separating Hyperplane: Linear SVM aims to find the hyperplane that best separates
the classes in the feature space. The hyperplane is a decision boundary that divides the
feature space into regions associated with different classes.
• Maximizing Margin: The objective is to determine the hyperplane that maximizes the
margin between the nearest data points (support vectors) from different classes. This
margin represents the distance between the hyperplane and the closest data points.
• Optimization Problem: Linear SVM solves an optimization problem to find the
weights and biases that define the hyperplane. This problem is typically solved using
optimization techniques such as gradient descent.
Features of Linear SVM Classification:
• Effective for Linearly Separable Data: Linear SVM works well when the classes are
linearly separable, meaning a straight line or hyperplane can be drawn to separate them.
• Margin Maximization: It maximizes the margin between classes, which often leads to
better generalization and improved performance on unseen data.
• Robustness to Overfitting: SVMs are less prone to overfitting, especially in high-
dimensional spaces, compared to other algorithms like decision trees.
• Global Solution: Linear SVM typically finds the global optimum solution, meaning it
converges to the best possible hyperplane.
Advantages:
• Works well in High-Dimensional Spaces: Linear SVM performs well even in cases
where the number of dimensions is greater than the number of samples.
• Effective with Limited Data: It can handle datasets with a small number of samples
effectively.
• Robust to Noise: SVMs are relatively robust to noise in the data, thanks to the margin
maximization objective.
Disadvantages:
• Computationally Intensive for Large Datasets: Training time can be significant,
especially for large datasets, due to the computational complexity of solving the
optimization problem.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Less Effective with Non-Linear Data: Linear SVM may not perform well when the
data is not linearly separable. In such cases, nonlinear SVM variants or kernel tricks
may be more appropriate.
Applications:
• Text Classification: Linear SVM is commonly used in text classification tasks, such
as spam detection, sentiment analysis, and document categorization.
• Image Classification: It's also applied in image classification problems, like object
recognition and image segmentation.
• Bioinformatics: Linear SVM finds applications in bioinformatics for tasks such as
protein classification and gene expression analysis.
Example: Consider a binary classification problem where we have two classes, represented as
red and blue points in a two-dimensional feature space. Linear SVM aims to find the optimal
hyperplane (a line in this case) that separates these two classes with the maximum margin. This
hyperplane ensures that the distance between the closest points (support vectors) from each
class is maximized.
For instance, in a spam email classification scenario, linear SVM can be trained on a dataset
containing features extracted from emails (e.g., word frequencies) along with corresponding
labels (spam or not spam). It learns a hyperplane to distinguish between spam and legitimate
emails, enabling accurate classification of new, unseen emails.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
How Nonlinear SVM Classification Works: Nonlinear Support Vector Machine (SVM)
classification addresses scenarios where classes are not linearly separable. Here's how it works:
• Kernel Functions: Nonlinear SVM utilizes kernel functions to map the input data into
a higher-dimensional feature space. These kernel functions allow SVM to implicitly
transform the data, potentially making the classes linearly separable in the transformed
space.
• Mapping to Higher Dimension: By applying kernel functions, SVM can effectively
map the input data into a higher-dimensional feature space where the classes might
become linearly separable.
• Classification in Higher Dimension: In this higher-dimensional space, SVM aims to
find the optimal hyperplane that separates the classes with the maximum margin,
similar to linear SVM in the original feature space.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Ability to Handle Nonlinear Data: Nonlinear SVM can handle data that is not linearly
separable by transforming it into a higher-dimensional space where linear separation
may be possible.
• Flexibility with Kernel Functions: Various kernel functions can be employed based
on the nature of the data and the problem, providing flexibility in capturing nonlinear
relationships.
• Efficient in High-Dimensional Spaces: Despite the transformation to a higher-
dimensional space, SVM remains efficient in terms of computation, especially
compared to explicit feature expansion methods.
Advantages:
• Versatility: Nonlinear SVM is versatile and can be applied to a wide range of problems
where linear separation is not feasible.
• Capturing Complex Relationships: By mapping data into a higher-dimensional space,
nonlinear SVM can capture complex relationships between features, leading to
improved classification accuracy.
• Kernel Flexibility: The choice of kernel functions allows customization according to
the specific characteristics of the data, potentially enhancing performance.
Disadvantages:
• Selection of Kernel Parameters: Choosing the appropriate kernel and its parameters
can be challenging and may require careful tuning, leading to potential overfitting if
not done properly.
• Computational Complexity: Nonlinear SVM can be computationally intensive,
especially with large datasets or complex kernel functions, which may increase training
time and resource requirements.
Applications:
• Image Recognition: Nonlinear SVM is widely used in image recognition tasks such as
object detection and facial recognition, where the relationships between image features
may be nonlinear.
• Bioinformatics: It finds applications in bioinformatics for tasks such as protein
structure prediction and gene expression analysis, where the underlying relationships
between biological features can be complex.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Finance: In finance, nonlinear SVM can be used for tasks like stock market prediction
and credit risk assessment, where data often exhibit nonlinear patterns.
Example: Consider a dataset with two classes, represented as concentric circles in a two-
dimensional feature space. Linear SVM would struggle to separate these classes effectively.
However, by using a Gaussian radial basis function (RBF) kernel, SVM can map the data into
a higher-dimensional space where the classes become linearly separable. This allows SVM to
find a hyperplane that effectively separates the classes, enabling accurate classification even
for nonlinear data distributions.
DIFFERENCE BETWEEN LINEAR SVM AND NON-LINEAR SVM
SVM REGRESSION:
• Support Vector Regression (SVR), a type of regression task, also uses SVM.
• SVR aims to find a function that predicts the continuous target variable while
maximising the margin of tolerance (ε) around the predicted value.
• Similar to classification, SVR also uses kernel functions to handle nonlinear
relationships between features.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• The objective of SVR is to find a function that stays within the margin of tolerance for
as many training instances as possible while maximising the margin.
How Support Vector Machine (SVM) Regression Works: Support Vector Machine (SVM)
Regression, also known as Support Vector Regression (SVR), is a type of regression task that
utilises the principles of SVM. Here's how it works:
• Objective: SVR aims to find a function that predicts the continuous target variable
while maximising the margin of tolerance (ε) around the predicted value. The margin
of tolerance allows some deviation from the actual target value.
• Kernel Functions: Similar to classification tasks, SVR also employs kernel functions
to handle nonlinear relationships between features. These kernel functions help map the
input data into a higher-dimensional space where a linear relationship may be
established.
• Margin Optimisation: The objective of SVR is to find a function that not only predicts
the target variable accurately but also stays within the margin of tolerance for as many
training instances as possible. This is achieved by maximizing the margin between the
predicted values and the margin boundaries.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Ability to Handle Outliers: SVR can effectively handle outliers by minimising their
impact on the model through the use of a margin of tolerance.
Advantages:
• Effective in High-Dimensional Spaces: SVR performs well even in high-dimensional
feature spaces, making it suitable for datasets with many features.
• Handles Nonlinear Relationships: By utilising kernel functions, SVR can model
nonlinear relationships between features, providing greater flexibility in capturing
complex patterns in the data.
• Resistant to Overfitting: SVR is less susceptible to overfitting, thanks to the margin
of tolerance and regularisation techniques, such as parameter C in the optimisation
objective.
Disadvantages:
• Sensitivity to Kernel Parameters: The performance of SVR can be sensitive to the
choice of kernel function and its parameters. Careful tuning is often required to achieve
optimal results.
• Computational Complexity: SVR can be computationally intensive, especially when
dealing with large datasets or complex kernel functions.
Applications:
• Stock Price Prediction: SVR can be used to predict stock prices based on historical
data, considering various factors such as market trends, trading volume, and economic
indicators.
• Energy Load Forecasting: SVR can forecast energy consumption or load demand,
helping utility companies optimise resource allocation and manage energy production
efficiently.
• Medical Diagnosis: SVR can assist in medical diagnosis tasks by predicting clinical
outcomes or disease progression based on patient data, such as demographics,
symptoms, and medical history.
Example: Consider a dataset containing information about houses, including features like
square footage, number of bedrooms, and location. SVR can be employed to predict the selling
price of houses based on these features. By training an SVR model on historical data, it learns
to predict the selling price while considering a margin of tolerance around the actual selling
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
price. This allows the model to make accurate predictions while accounting for variations and
uncertainties in the data.
NAÏVE BAYES CLASSIFIERS:
• Naïve Bayes classifiers are probabilistic models based on Bayes' theorem with a strong
assumption of feature independence.
• Despite the "naïve" assumption of feature independence, Naïve Bayes classifiers often
perform well in practice, especially for text classification tasks.
• Naïve Bayes classifiers are simple and computationally efficient, making them
particularly suitable for large datasets.
• Common variants of Naïve Bayes classifiers include Gaussian Naïve Bayes (for
continuous features), Multinomial Naïve Bayes (for discrete features with counts), and
Bernoulli Naïve Bayes (for binary features).
• Naïve Bayes classifiers calculate the posterior probability of each class given the input
features and then select the class with the highest probability as the predicted class.
Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier
✓ Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
✓ Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we need
to follow the below steps:
✓ Convert the given dataset into frequency tables.
✓ Generate Likelihood table by finding the probabilities of given features.
✓ Now, use Bayes theorem to calculate the posterior probability.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
✓ Problem: If the weather is sunny, then the Player should play or not?
✓ Solution: To solve this, first consider the below dataset:
Applying Bayes'theorem
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes) / P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Applications:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It is used in Text classification such as Spam filtering and Sentiment analysis.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Image Segmentation with Clustering
Image segmentation is the process of dividing an image into distinct regions based on
the properties of the pixels. Clustering algorithms excel at grouping similar data points
together. In image segmentation, this translates to grouping pixels with similar
characteristics, like color intensity, texture, or spatial location.
There are two main categories of clustering used for image segmentation:
K-Means Clustering: Here, you might set K=2 (one for the flower and one for
the background). The algorithm would group pixels with similar red hues into
one cluster (flower), and those with green tones into another (background).
Hierarchical Clustering: In the bottom-up approach, individual pixels with
similar red colors would first merge, followed by merging with other close red
pixels, forming the flower segment. Similarly, green pixels would progressively
merge into the background segment.
Limitations of Clustering for Segmentation
Clustering provides a valuable method for image segmentation, especially for simpler
images or as a pre-processing step for more advanced techniques. Its ease of
implementation and efficiency for specific scenarios make it a practical tool in image
analysis.
Clustering can be a valuable tool in data preprocessing for various machine learning
tasks.
What is Clustering?
Customer ID
Product purchased
Amount spent
Demographics (age, location)
You can use clustering to group customers based on their buying habits. For instance,
one cluster might represent customers who frequently buy electronics, another might
be for those who purchase groceries regularly. This information can be used for
targeted marketing campaigns or product recommendations.
By effectively using clustering for data preprocessing, you can prepare your data for
machine learning tasks, leading to better model performance and valuable insights.
Clustering algorithms group data points together based on their similarities. In semi-
supervised learning, clustering can be used in two main ways:
1. Cluster-then-Label Approach:
o Here, clustering is used to identify inherent structures within the
unlabelled data. The data is divided into clusters that are likely to
represent different classes.
o Example: Imagine classifying handwritten digits (0-9). We have a small
set of labelled digits and a large set of unlabelled ones. Clustering can
group the unlabelled digits based on their shape and features, creating
clusters that likely correspond to specific digits.
o Once the data is clustered, a supervised learning model can be used to
analyse the labelled data and assign class labels (0-9) to each cluster
based on the representative points within the cluster.
2. Self-Supervised Clustering for Representation Learning:
o This approach utilizes clustering techniques to learn meaningful
representations for the data, even with limited labelled data.
o Imagine training a model to classify different dog breeds. We can use a
self-supervised clustering method to group unlabelled dog images based
on visual similarities. This step helps the model learn features that
differentiate dog breeds without explicit labels for each breed.
o Then, with the learned representations, a supervised model can be
trained on the labelled data to classify specific dog breeds.
Benefits of using Clustering:
Remember:
Loss Function:
• Gradient Descent: Gradient descent is then used by the network to
reduce the loss. To lower the inaccuracy, weights are changed based on
the derivative of the loss with respect to each weight.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Adjusting weights: The weights are adjusted at each connection by
applying this iterative process, or backpropagation, backward across the
network.
• Training: During training with different data samples, the entire
process of forward propagation, loss calculation, and backpropagation is
done iteratively, enabling the network to adapt and learn patterns from
the data.
• Actvation Functions: Model non-linearity is introduced by activation
functions like the rectified linear unit (ReLU) or sigmoid. Their decision
on whether to “fire” a neuron is based on the whole weighted input.
Applications:
Neural networks find extensive applications across various domains.
1. Image and Speech Recognition: They power state-of-the-art systems for image
classification, object detection, speech recognition, and natural language
understanding.
2. Natural Language Processing: Machine translation, sentiment analysis, text
generation, and chatbots all use neural networks.
3. Financial forecasting, risk assessment, recommendation systems, and predictive
maintenance all use predictive analytics.
4. Healthcare: Neural networks contribute to disease diagnosis, medical image analysis,
drug discovery, and personalized medicine.
Example:
An illustrative example of neural network application is autonomous driving. Here, we use
neural networks for real-time object detection from camera feeds, lane detection, decision-
making based on sensor inputs, and predictive modeling for trajectory planning. By training on
large datasets of driving scenarios, neural networks can learn to navigate complex
environments, making them pivotal in the development of self-driving technology.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
DEEP LEARNING: Deep learning is a subfield of machine learning that focuses on the
development and application of neural networks with multiple layers (hence "deep"), enabling
the learning of intricate patterns and representations from large amounts of data. Unlike
traditional machine learning algorithms, deep learning architectures automatically learn
hierarchical representations of data through successive layers of abstraction. This approach
allows deep learning models to effectively handle complex tasks such as image recognition,
speech recognition, natural language processing, and more, often achieving state-of-the-art
performance. Deep learning has revolutionized various industries by driving advancements in
artificial intelligence, enabling systems to autonomously learn and adapt from vast and diverse
datasets without explicitly programming task-specific rules.
Working process:
Deep learning's working process involves several key steps that enable neural networks with
multiple layers to learn complex patterns and representations from data:
• Data Collection and Preparation: To train deep learning models, large amounts of
labeled or unlabeled data are required. Data collection involves gathering relevant
datasets and preprocessing them to ensure they are suitable for training.
• Model Architecture Design: The next step involves designing the deep learning model's
architecture. This includes determining the number of layers, the type of layers (e.g.,
convolutional, or recurrent), the activation functions, and other hyperparameters.
• Forward Propagation: The neural network passes input data through its layers during
the training process, a process known as forward propagation. Each layer performs
computations (e.g., matrix multiplications, applying activation functions) to transform
the input data into meaningful representations.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Loss Calculation: After propagating the data through the network and making a
prediction, a loss function (e.g., mean squared error for regression, cross-entropy for
classification) compares the output to the ground truth (actual) values. The loss function
quantifies how far off the predictions are from the true values.
• Deep learning models use backpropagation as a key algorithm for training. It entails
calculating the loss function's gradient with respect to each parameter (weight and bias)
in the network. This gradient guides the adjustment of parameters to minimize the loss.
• Gradient Descent Optimization: An optimization algorithm, such as stochastic gradient
descent or Adam, uses the gradients computed during backpropagation to update the
neural network's parameters in a direction that minimizes the loss function. This
iterative process helps the model learn the optimal parameters for making accurate
predictions.
• Iterative Training: Multiple epochs (passes through the entire dataset) typically perform
the training process iteratively. Each epoch consists of forward propagation, loss
calculation, backpropagation, and parameter updates. The goal is to minimize the loss
on the training data while avoiding overfitting on the validation data.
• Model Evaluation and Testing: After training, we assess the model's performance on
unseen data using a separate test dataset. We use performance metrics like accuracy,
precision, recall, or F1-score to assess the model's ability to generalize to new data.
• Deployment and Inference: Following successful training and evaluation, we can
deploy the trained deep learning model for inference on fresh data. In deployment, the
model takes input data, performs forward propagation, and generates predictions or
classifications based on the learned patterns and representations.
Applications:
• Image Classification and Object Detection: Deep learning, especially convolutional
neural networks (CNNs), is used to identify and classify objects in images, vital for
tasks like autonomous driving and medical imaging.
• Natural Language Processing (NLP): Deep learning powers language translation,
sentiment analysis, and virtual assistants, enhancing communication and understanding
across languages.
• Speech Recognition and Synthesis: Deep learning enables accurate speech
recognition (e.g., in virtual assistants) and natural-sounding text-to-speech synthesis for
human-like voices.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
• Healthcare Imaging and Diagnosis: Deep learning automates medical image analysis,
aiding in diagnosing diseases from X-rays, MRIs, and other scans with high accuracy.
• Autonomous Driving and Robotics: Deep learning algorithms process sensor data to
enable autonomous vehicles to perceive and navigate environments and empower
robots with object recognition and manipulation capabilities.
Difference between Machine Learning, Deep Learning and Neural networks
Aspect Machine Learning Deep Learning Neural Networks
Subset of AI; Building blocks of
Subset of Machine
Algorithms learn deep learning;
Learning; Uses deep
Definition patterns from data Comprised of
neural networks with
without explicit interconnected nodes
multiple layers.
programming. (neurons).
Can include various Focuses on deep neural Building blocks used
Complexity of
algorithms like decision networks with multiple in deep learning
Models
trees, SVM, k-NN, etc. hidden layers. architectures.
Learns features and Processes information
Automatically learns
Representation patterns from data through weighted
hierarchical
Learning through statistical connections and
representations of data.
methods. activation functions.
Typically uses large
Supervised, Trained via
labeled datasets for
unsupervised, or semi- backpropagation;
Training supervised learning;
supervised learning Adjusts weights to
Process Utilizes
using labeled or minimize prediction
backpropagation for
unlabeled data. errors.
training.
Automatically learns
Requires manual feature features from raw data, Processes raw input
Feature
extraction and selection reducing the need for data through layers to
Engineering
in some cases. manual feature extract features.
engineering.
Used in deep learning
Image recognition,
Image classification, applications like
speech recognition,
Application regression, clustering, convolutional neural
natural language
Examples reinforcement learning, networks (CNNs) and
processing, autonomous
etc. recurrent neural
driving, etc.
networks (RNNs).
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS WITH KERAS
Artificial neural networks (ANNs) are computational models inspired by the structure and
functioning of the human brain. They consist of interconnected nodes (neurons) organized into
layers, with each connection between neurons having an associated weight. ANNs are powerful
tools for learning complex patterns and making predictions from data.
Keras is a high-level neural network API written in Python that allows for easy and fast
experimentation with deep learning models. It provides a user-friendly interface to build, train,
and deploy neural networks, making it popular among both beginners and experienced
researchers.
MLP networks are usually used for supervised learning format. A typical learning algorithm
for MLP networks is also called back propagation’s algorithm.
Implementation:
To implement Multi-Layer Perceptrons (MLPs) using Keras, you can start by importing the
necessary modules: TensorFlow and Keras. Begin by defining your model using Sequential()
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
and adding layers with Dense(). Specify the input shape for the first layer and the number of
units for subsequent layers, along with activation functions like'relu' or'sigmoid'. After
defining the layers, compile the model using the model. compile() with appropriate loss
function ('binary_crossentropy' for binary classification or 'categorical_crossentropy' for
multi-class classification), optimizer ('adam','sgd', etc.), and metrics (['accuracy']). Then, fit
your model to the training data using the model. fit() is used with the specified number of
epochs and batch size. Evaluate the model using a model. Use the evaluate() function on the
test data to gauge the model's performance. Finally, you can use the trained model to make
predictions using the model. predict(). This structured approach leverages Keras's simplicity
and flexibility for building and training MLPs efficiently.
Implementation Steps with Keras:
1. Importing Libraries: Begin by importing the required libraries—tensorflow and
keras—to utilize Keras's high-level neural network API built on top of TensorFlow.
import tensorflow as tf from tensorflow import keras
2. Defining the Model Architecture: Use the Sequential model class to create a linear
stack of layers. Define the model architecture by adding layers sequentially using Dense
layers.
model = keras.Sequential([ keras.layers.Dense(units=64, activation='relu',
input_shape=(input_size,)), keras.layers.Dense(units=32, activation='relu'),
keras.layers.Dense(units=num_classes, activation='softmax') ])
o Input Layer: The first Dense layer specifies the input shape (input_size) and
applies the ReLU activation function.
o Hidden Layers: Additional Dense layers define the hidden layers with
specified numbers of units (neurons) and activation functions.
o Output Layer: The last Dense layer specifies the number of output classes
(num_classes) and uses the softmax activation function for multi-class
classification.
3. Compiling the Model: Configure the model for training using compile(), where you
specify the loss function, optimizer, and metrics to be used during training.
model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
o Loss Function: Use 'categorical_crossentropy' for multi-class classification
tasks or 'binary_crossentropy' for binary classification.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
o Optimizer: Select an optimizer like 'adam', 'sgd', or others to optimize the
model's weights during training.
o Metrics: Specify evaluation metrics such as 'accuracy' to monitor the model's
performance during training and validation.
4. Training the Model: Train the compiled model on training data using fit(), where you
specify the training data, number of epochs, batch size, and validation data.
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
o Epochs: Number of times the model will iterate over the entire training dataset
during training.
o Batch Size: Number of samples used per gradient update.
5. Evaluating the Model: Evaluate the trained model's performance on the test dataset
using evaluate() to compute the loss and any specified metrics.
test_loss, test_accuracy = model.evaluate(X_test, y_test) print(f'Test Loss: {test_loss}')
print(f'Test Accuracy: {test_accuracy}')
6. Making Predictions: Use the trained model to make predictions on new data using
predict(), which returns the predicted output for the input data.
predictions = model.predict(X_new)
Application:
• Image Classification: Recognizing objects or patterns in images.
• Text Classification: categorizing text into different classes.
• Regression Tasks: Predicting continuous values based on input features.
• Anomaly Detection: Identifying unusual patterns or outliers in data.
• Pattern Recognition: Recognizing complex patterns in data.
INSTALLING TENSOR FLOW 2
To install TensorFlow 2, you can use pip, the Python package installer, which is the
recommended method for most users. Here's how you can install TensorFlow 2, depending on
your Python environment:
1. Install TensorFlow 2 using pip (for Python environments):
If you have a Python environment set up, follow these steps:
For the CPU-only version:
pip install tensorflow
For the GPU version (which requires CUDA and cuDNN):
pip install tensorflow-gpu
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
2. Verify TensorFlow Installation:
After installation, you can verify that TensorFlow 2 is installed correctly by importing it in a
Python environment and checking the version:
import tensorflow as tf print("TensorFlow version:", tf.__version__)
Additional Notes:
• Virtual Environments (recommended): It's a good practice to use virtual
environments (e.g., venv, conda) to manage your Python projects. Activate your virtual
environment before installing TensorFlow.
• GPU Installation (optional): If you have an NVIDIA GPU and want to utilize GPU
acceleration, make sure you have installed compatible versions of CUDA and cuDNN
before installing tensorflow-gpu.
• Compatibility: Check TensorFlow's official installation guide for detailed instructions
and compatibility information based on your operating system and Python version.
Example Installation:
Here is an example of how you can install TensorFlow 2 using pip in a terminal (assuming you
have Python and pip installed):
Install TensorFlow CPU version pip install tensorflow # Install the TensorFlow GPU version
(assuming compatible CUDA and cuDNN are installed). pip install tensorflow-gpu
Make sure to replace pip with pip3 if you are using Python 3 and have multiple Python versions
installed.
LOADING AND PREPROCESSING DATA WITH TENSOR FLOW.
Loading Data
Loading data involves reading and importing datasets into your machine learning application.
In TensorFlow, data can be loaded from various sources:
• NumPy Arrays or Tensors: TensorFlow can directly work with NumPy arrays or
TensorFlow tensors. You can create datasets using tf.data.Dataset.from_tensor_slices().
• Files (e.g., CSV, TFRecord): TensorFlow provides utilities to load data from files like
CSV (using tf.data.experimental.make_csv_dataset) or TFRecord (using
tf.data.TFRecordDataset). This is useful for handling large datasets that don't fit into
memory.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Preprocessing Data
Data preprocessing is essential to prepare the data for training and ensure that it's in a suitable
format for machine learning algorithms:
• Normalization: Scaling features to a similar range (e.g., mean normalization or min-
max scaling) helps in improving convergence and performance of machine learning
models.
• Data Augmentation: Commonly used for image data, data augmentation involves
creating new training examples by applying random transformations like rotations,
flips, and shifts. This helps in increasing the diversity of the training data and improving
model generalization.
• Feature Engineering: Transforming raw data into a format that is more suitable for
the model. This may involve encoding categorical variables, handling missing values,
or extracting relevant features.
• Batching and Shuffling: Data is often processed in batches during training to improve
efficiency. Shuffling the data ensures that the model sees different samples in each
epoch and prevents it from memorizing the order of the data.
Building Data Pipelines
Data pipelines in TensorFlow are used to efficiently process and feed data into machine
learning models:
• Iterating over Datasets: TensorFlow datasets (tf.data.Dataset) provide an abstraction
for handling large amounts of data. You can iterate over datasets using methods like for
batch in dataset: to extract batches of data during training.
• Prefetching: Prefetching data allows the model to fetch batches of data in parallel with
model training, improving overall training performance by reducing idle time.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College
Importance of Data Handling
Effective data loading and preprocessing are crucial for successful machine learning model
training:
• Data Quality: Proper preprocessing ensures that the data is clean, standardized, and
suitable for the chosen machine learning algorithm.
• Model Performance: Well-preprocessed data can significantly impact model
performance, leading to faster convergence and better generalization.
• Scalability: Efficient data pipelines are essential for handling large datasets that cannot
fit into memory, enabling scalable and distributed training.
M.SRIKANTH.,
M.TECH.,(Ph.D)
Asst.Prof, Dept of IT
SRKR Engineering College