AIDS Module 1 notes draft
AIDS Module 1 notes draft
To tackle such challenges, we now rely on data science and artificial intelligence (AI)—fields
inspired by how the human brain processes information. Data science involves manipulating
and analyzing data to gain better insights and extract meaningful information. Artificial
intelligence enables machines to perform these operations using computational power.
g e
l e
Within AI, a subset called machine learning (ML) focuses on teaching computers how to
o l
learn from data. In ML, systems are trained to recognize patterns, make decisions, and even
C
predict future outcomes, all based on historical or input data. Together, these tools and
n g
techniques help us make informed decisions and automate complex processes across industries.
r i
Module 1 e e
i n
g
En
[Introduction to AI and Machine Learning: Basics of Machine Learning - types of Machine
Learning systems-challenges in ML- Supervised learning model example- regression models-
E A
Classification model example- Logistic regression-unsupervised model example- K-means
KM
clustering. Artificial Neural Network- Perceptron- Universal Approximation Theorem
F ,
(statement only)- Multi-Layer Perceptron- Deep Neural Network- demonstration of regression
, S
and classification problems using MLP.(Text-2)]
V
n i
Artificial Intelligence (AI) is the field of study that focuses on enabling machines to perform
li
tasks that typically require human intelligence. These tasks include learning from experience,
a
h
understanding language, recognizing patterns, making decisions, and even solving problems.
S
A vital part of AI is Machine Learning (ML), which refers to the ability of machines to learn
r .
from data without being explicitly programmed for each task. ML systems analyze past data,
D
identify patterns, and use these insights to make predictions or decisions about new data.
There are different types of Machine Learning systems based on the nature of the learning task
and the kind of supervision involved. The three broad categories are supervised learning,
unsupervised learning, and reinforcement learning. In supervised learning, the model is
trained on a labeled dataset—that is, input data is paired with the correct output. In
unsupervised learning, the model is given input data without any labeled responses and must
discover underlying patterns on its own. Reinforcement learning involves training a model to
make a sequence of decisions by rewarding it for good actions and penalizing it for poor ones.
Despite its potential, Machine Learning faces several challenges. One key issue is the quality
and quantity of data—insufficient, biased, or noisy data can lead to poor model performance.
1
Another challenge is overfitting, where a model learns the training data too well and fails to
generalize to new data. Interpretability is also a major concern, especially with complex models
like deep neural networks, as it's often difficult to understand how decisions are being made.
A basic example of supervised learning is a regression model, where the goal is to predict a
continuous output variable. For instance, predicting house prices based on square footage,
location, and number of bedrooms can be achieved using linear regression. In contrast, a
classification model is used when the output variable is categorical. For example, identifying
whether an email is spam or not is a classification task, and one common algorithm used here
is logistic regression, which models the probability that a given input belongs to a certain
class.
An example of unsupervised learning is K-means clustering, which groups data into clusters
g e
based on similarity. In K-means, the algorithm tries to divide the data into K distinct clusters,
l e
where each data point belongs to the cluster with the nearest mean. This is useful in market
o l
segmentation, image compression, and social network analysis, where patterns or groupings in
C
data are not predefined. g
i n
r
One of the most important advances in Machine Learning is the development of Artificial
e
e
Neural Networks (ANNs), which are computational models inspired by the structure and
n
i
function of the human brain. The simplest form of an ANN is the Perceptron, a model
g
En
consisting of a single layer of nodes used for binary classification. While the perceptron can
solve simple linear problems, it is limited when it comes to complex or non-linear tasks.
A
M E
K
The Universal Approximation Theorem states that a neural network with at least one hidden
,
F
layer and sufficient neurons can approximate any continuous function on a closed interval,
S
,
given appropriate weights and activation functions. This theorem provides the theoretical
V
foundation for using neural networks to solve a wide range of problems.
i
n
li
To overcome the limitations of the simple perceptron, more complex architectures like the
a
h
Multi-Layer Perceptron (MLP) have been developed. An MLP consists of an input layer,
S
one or more hidden layers, and an output layer. Each layer is made up of multiple neurons, and
.
r
non-linear activation functions are used to enable the network to model complex relationships
D
in data.
When MLPs are expanded with more layers and larger datasets, they form Deep Neural
Networks (DNNs), which are capable of handling tasks such as image recognition, speech
processing, and natural language understanding. These deep architectures have led to major
breakthroughs in AI over the past decade.
The practical application of an MLP can be demonstrated through both regression and
classification problems. For regression, an MLP can learn to predict values such as stock prices
or temperatures by analyzing numerical input features. For classification, it can be trained to
2
recognize handwritten digits or detect spam emails, learning to associate patterns in the input
data with specific output classes.
Table of Contents
Basics of Machine Learning ................................................................................................................... 4
Types of Machine Learning Systems ...................................................................................................... 4
1. Supervised Learning ....................................................................................................................... 4
2. Unsupervised Learning ................................................................................................................... 5
3. Semi-Supervised Learning .............................................................................................................. 6
4. Reinforcement Learning ................................................................................................................. 7
Summary Table of All Learning Types ......................................................................................... 8
e
Challenges in Machine Learning ............................................................................................................ 8
g
l e
1. Data-Related Challenges ................................................................................................................. 8
o l
2. Model-Related Challenges .............................................................................................................. 9
C
3. Computational Challenges ..............................................................................................................
g 9
i n
4. Ethical and Interpretability Issues ................................................................................................... 9
e r
5. Deployment and Maintenance ........................................................................................................ 9
n e
g i
Supervised learning model examples ...................................................................................................... 9
1.
En
Regression models ..................................................................................................................... 9
1. Linear Regression ...................................................................................................................
A 11
M E
2. Multiple Linear Regression .................................................................................................... 12
, K
S F
3. Polynomial Regression ........................................................................................................... 13
4. Decision Tree RegressionV ,
....................................................................................................... 14
n i
Summary Table: .......................................................................................................................... 15
2. Classification model l i ....................................................................................................... 15
example
h a ................................................................................................................ 17
1. Logistic Regression
S
2. Decision
. Trees ........................................................................................................................ 18
D r
3. K-Nearest Neighbors (KNN) .................................................................................................. 18
4. Random Forest........................................................................................................................ 18
5. Support Vector Machine (SVM) ............................................................................................. 18
6. Naive Bayes ............................................................................................................................ 19
Unsupervised model example- K-means clustering.............................................................................. 19
Introduction to Artificial Neural Networks (ANNs) ............................................................................. 20
Summary of Core Concepts: ............................................................................................... 23
Perceptron ............................................................................................................................................. 23
Working of Neural Networks ................................................................................................................ 25
Universal Approximation Theorem (UAT) – Statement only ............................................................... 26
3
Summary ............................................................................................................................... 27
Multi-Layer Perceptron (MLP) ............................................................................................................. 27
Deep Neural Network (DNN) ............................................................................................................... 29
Please ensure that Jupyter Notebook is installed and ready to use through your Anaconda
account. If you haven't done so already:
4
The computer is trained on this data. The computer learns the relationship
Later, when you show a new picture of between house size and price. Later, if you
, it says “Apple” — because it has give it a house of 1800 sqft, it can predict the
learned from examples. price.
Image Label
Apple Size of House
Price (₹)
(sqft)
Banana
Grapes 1000 ₹50 lakhs
1500 ₹75 lakhs
Email spam detection: “This is not
“This is spam” spam” 2000 ₹1 crore
g e
l e
o l
2. Unsupervised Learning C
n g
Here, the algorithm works with unlabeled data and tries to find hidden patterns or structures.
Goal: Discover the underlying structure in data. ri
e
Applications: Customer segmentation, anomaly detection, topic modeling.e
Examples: i n
g
•
En
Clustering: Grouping similar data points (e.g., K-Means).
• E A
Dimensionality Reduction: Reducing the number of variables (e.g., PCA).
KM
Example 1: Customer Segmentation F ,
S
Imagine a company has data about customers:
,
Age Income V Shopping Amount
22 ₹30K n i ₹5000
45
30
₹90K
a
₹50K
li ₹25,000
₹7000
50 S h
₹1L ₹30,000
.
The company doesn't know anything else — no labels.
r
D
They use unsupervised learning to group customers into categories like:
Low-income low-spending
Middle-income moderate-spending
High-income high-spending
This helps them target marketing better.
5
Common methods used in Unsupervised Learning:
1. Clustering:
Two Common Clustering Methods:
K-Means Clustering:
You tell the computer how many clusters (groups) you want — say 3.
It finds 3 center points, then assigns every data point to the nearest center.
Hierarchical Clustering:
It doesn’t need you to specify the number of groups.
It builds a tree of data points — grouping similar ones step by step.
You can cut the tree at any level to form clusters.
2. Dimensionality reduction:
Imagine you're filling out a student form:
Name
Father’s Name g e
Mother’s Name l e
School Name
o l
Class C
Age
n g
Date of Birth
r i
Now you realize:
e e
Age and Date of Birth give almost the same info.
i n
Father’s and Mother’s Name may not help in the analysis.g
En
So you reduce the number of fields, keeping only the useful ones, like:
Name A
Class
M E
Age
, K
F
That’s dimensionality reduction — removing unnecessary or repeated info to keep the
S
important stuff.
V ,
Real-world data often has hundreds or thousands of features (columns).
n i
Dimensionality reduction helps to:
1. Remove noise
a li
h
2. Make visualization easier
S
3. Speed up learning
r .
D
3. Semi-Supervised Learning
A hybrid approach where the model is trained on a small amount of labeled data along with a large
amount of unlabeled data.
Goal: Improve learning accuracy when labeling data is expensive.
Applications: Medical imaging, speech recognition, web content classification.
Applications of Semi-Supervised Learning
Application How it's used
Medical Imaging Only a few X-rays are labeled by doctors, rest are unlabeled
Speech Recognition Only some audio clips have transcripts
6
Web Content Classification A few web pages are manually labeled; model guesses the rest
Language Translation Limited labeled sentence pairs, huge unlabeled corpus
Imagine:
You’re at a fruit market and you label only 3 fruits:
Apple
Banana
Orange
But there are hundreds of other fruits with no labels.
You then let a child observe the shapes and colors of the other fruits.
Over time, the child starts recognizing and labeling the rest — based on the few examples
g e
you gave. l e
o l
C
n g
ri
e e
🧾 In Simple Machine Learning Terms:
i n
Term Meaning g
Labeled Data En
Data with correct answers (e.g., email = spam or not)
Unlabeled Data A
Data without any labels (e.g., just the email text)
E
Semi-Supervised M
Model trained on a small amount of labeled data + a large amount of
K
Learning ,
unlabeled data
F
, S
V
n i
ali
4. Reinforcement
h
S Learning
r . an agent learns to make decisions by interacting with an environment, receiving
In this paradigm,
rewards orD penalties for actions.
Goal: Learn a sequence of actions that maximize cumulative reward.
Applications: Robotics, game AI (e.g., AlphaGo), autonomous vehicles.
Key Components:
1. Agent: Learner or decision maker.
2. Environment: Where the agent interacts.
3. Reward Signal: Feedback to guide learning.
🎮 What is Reinforcement Learning?
Simple Explanation:
Imagine you're teaching a dog a trick — say, to sit.
At first, the dog doesn’t know what to do.
When it accidentally sits, you give it a treat (reward).
7
When it jumps instead, you give no treat.
Over time, the dog learns that sitting = reward — and does it more often.
This is Reinforcement Learning:
Learning by trial and error with rewards and penalties.
Video Games
You’re playing a game:
You move forward → gain coins
You fall into a pit → lose a life
You finish a level → bonus points
In Machine Learning Terms:
Term Meaning
Agent The learner (e.g., dog, robot, AI player)
Environment The world it interacts with (e.g., house, game, road)
g e
Action What the agent chooses to do (e.g., sit, jump, move
l e
left)
o l
Reward Positive or negative feedback (e.g., treat, penalty, C
score) n g
Goal Learn which actions give the maximum long-term
r i
reward e e
i n
g
En
📊 Summary Table of All Learning
E ATypes
M K
Learning Type What it Learns From
F , Data Requirement Example
Supervised
, S
Correct answers (labels) Lots of labeled data Spam detection
Learning V
Unsupervised
Learning i ni patterns
Hidden Only input data, no labels
Customer
grouping
a l Few labels + many Small labeled + large Language model
Semi-Supervised
S h unlabeled examples unlabeled data training
Reinforcement
r . Rewards and No labels — only Game AI, robot
Learning D punishments feedback after action walking
1. Data-Related Challenges
• Insufficient Data: Not enough examples to train the model effectively.
• Poor Quality Data: Noisy, missing, or incorrect data can mislead the model.
• Imbalanced Data: One class dominates others, leading to biased models.
8
• High Dimensionality: Too many features can lead to overfitting and complexity.
2. Model-Related Challenges
• Overfitting: Model learns noise instead of pattern (high accuracy on training data but
poor generalization).
• Underfitting: Model is too simple to capture underlying trends (poor performance).
• Model Selection: Choosing the right algorithm for a task can be tricky.
• Hyperparameter Tuning: Requires trial and error to find optimal settings.
g e
3. Computational Challenges e
o ll
C
• Scalability: Training large models on big datasets requires high computation power.
• Latency: In real-time applications, predictions need to be fast. g
nstorage.
• Resource Limitations: Limited access to GPUs, memory, i or
e r
n e
g i
4. Ethical and Interpretability Issues En
E A
• Bias and Fairness: ML models canM perpetuate or amplify biases in training data.
K
, like deep neural networks are hard to explain.
• Interpretability: Black-box models
S F
• Privacy Concerns: Use of sensitive data (e.g., health records) must ensure data
protection. V ,
i ni
a l
5. Deployment Sh and Maintenance
r .
• Model
D Drift: Performance may degrade over time as data patterns change.
• Version Control: Managing updates to data, models, and code.
• Integration: Incorporating ML models into existing software systems.
1. Regression models
Supervised Learning:
9
You give the computer:
Input data
Correct output (label)
The computer learns the relationship and uses it to predict the output for new data.
Regression:
A regression model is used when you want to predict a number (a continuous value).
g in
This is your training data.
Now a customer comes and asks:
En
E A
“What would be the price of an 1800 sqft, 3-bedroom house?”
KM
You give this to your trained regression model, and it predicts:
F ,
₹90 lakhs S
V ,
i
How does the Regression Model work?
n
ali
Step 1: Understand the pattern
It learns from the data:
S h
As size increases, price increases.
.
More bedrooms usually mean higher price.
r
D
Step 2: Fit a line or model
It tries to fit a mathematical formula that best matches the data:
Price = a × Size + b × Bedrooms + c
Where:
a, b, and c are values the model learns from the data.
Step 3: Make predictions
When new data comes in (like 1800 sqft, 3 bedrooms), it plugs into the formula and
gives a predicted price.
10
Type of Regression What it does Best for
Linear Regression Fits a straight line When data has a simple trend
Multiple Linear Uses more than one feature When more than one input
Regression affects output
Random Forest Uses multiple trees, more For complex and noisy data
Regression accurate
g e
Types of Regression Models in Supervised Learning l e
o l
C
📘 1. Linear Regression n g
r i
Description:
e e
n
Models the relationship between a single independent variable (X) and a dependent variable
i
(Y) using a straight line. g
The equation is: En
𝑌 = 𝑎𝑋 + 𝑏
A
where a is the slope and b is the intercept. M E
When to Use: , K
F
When the relationship between variables is linear (i.e., increasing or decreasing in a straight line).
S
,
Example: Predicting house price based on its size (sqft).
V
n i
a li
S h
r .
D
11
g e
l e
o l
C
n g
r i
e e
i n
g
En
E A
Diagram Insight: M
The red line is the "best-fit" line. , K
F
The model minimizes the distance between data points and this line (least squares method).
S
V ,
n i
a li
S h
r .
D
📘 2. Multiple Linear Regression
Description:
Models the relationship between two or more independent variables and a dependent variable.
The equation is:
𝑌 = 𝑎1𝑋1 + 𝑎2𝑋2 + ⋯ + 𝑏
When to Use:
When you want to include multiple factors in prediction.
Example: Predicting house price based on size, number of bedrooms, and location.
12
g e
le
o l
C
n g
r i
Diagram Insight: e e
n
Each dot represents a data point with more than one input variable (X₁, X₂...).
i
g
Harder to visualize in 2D, but the model combines all inputs to estimate Y.
En
E A
K M
F ,
📘 3. Polynomial Regression
,S
Description:
V
i ni
• Extends linear lregression by adding powers of the input variable (X², X³, etc.).
a
Sh patterns in data.
• Captures curved
r
Equation
.
Example:
D 𝑌 = 𝑎𝑋2 + 𝑏𝑋 + 𝑐
When to Use:
When data shows non-linear trends, like growth curves, speed vs. time, etc.
Example: Predicting population growth.
13
g e
l e
o l
C
n g
ri
Diagram Insight: e
The orange curve shows a quadratic fit (degree 2). n e
i
Better fits curving patterns that a straight line would miss.
g
En
📘 4. Decision Tree Regression MEA
Description: , K
S F
Breaks data into branches based on feature conditions (like a flowchart).
V ,
Each branch leads to a predicted output value.
When to Use:
n i
a li
When the data has sudden jumps or clear thresholds.
Handles both linear and non-linear data.
Example: S h
.
Predicting electricity usage:
r
D
If temperature < 20°C → Low usage
If 20–30°C → Medium usage
If >30°C → High usage
14
g e
l e
o l
C
Diagram Insight: n g
r
You can see levels of outputs (5, 10, 15) based on value ranges.
i
Useful for interpretable, rule-based decisions. e e
i n
g
🧠 Summary Table: En
Regression Type Shape of E A
Handles Multiple Suitable For
Model
MK Inputs?
Linear Regression Straight Line
,
F No Simple linear trends
Multiple Linear , S
Straight Plane Yes Multi-factor predictions
Regression
V
Polynomial Regression ni Line
iCurved Yes Non-linear trends
Decision Tree
a l Step-wise Yes Rule-based or piecewise data
Regression
S h patterns
r .
D
2. Classification model example
A classification model is used in supervised learning when the output you are predicting is
a category or class, not a number.
15
Real-World Example: Email Spam Detection
Let’s say you want to build a system that decides whether an incoming email is
“Spam” or “Not Spam”.
What the model sees (input features):
Does the email contain the word "free"?
Is there a suspicious link?
How many recipients are there?
Who is the sender?
These are called features (inputs).
What the model predicts (output label):
Spam
Not Spam
g e
This is the class the model has to predict. l e
o l
How Does the Classification Model Work? C
Step-by-step:
n g
1. Training:
ri
You show the model many examples:
e e
Email A → Spam
i n
Email B → Not Spam g
Email C → Spam En
A
... (with inputs like keywords, links, etc.)
E
2. Learning: K M
,
The model finds patterns — for example:
F
S
Emails with “win money” are often spam.
,
V
Emails from known contacts are not.
3. Prediction: n i
li
When a new email arrives, the model uses what it learned and says:
a
“This email is likely spam.”
h
S
r .
D
Most Common Classification Algorithms
16
📌1. Logistic Regression
What is it?
Logistic Regression is a classification algorithm, not a regression algorithm (despite its name). It’s
used to predict categorical outcomes, mostly binary (like Yes/No, 0/1, True/False).
How it Works e
1. Linear Combination e g
First, it computes a linear combination of input features: l l
z = b0 + b1·x1 + b2·x2 + ... + bn·xn C o
2. Apply Sigmoid Function g
Then, it passes z through the sigmoid function: i n
P(y) = 1 | x) = 1 / (1 + e^(−z)) e r
n
This gives the probability that the input belongs to class 1.
e
3. Classification Rule g i
If the probability > 0.5 → predict class 1 En
If the probability ≤ 0.5 → predict class 0
E A
(This threshold can be adjusted)
K M
Training the Model: Log Loss F ,
S
To train Logistic Regression, we use a loss function called Log Loss (or Cross
,
Entropy): V
i
Loss = −[y·log(p) + (1−y)·log(1−p)]
n
li
y = actual answer (0 or 1)
a
p = predicted probability
h
S
The algorithm uses gradient descent to adjust the weights (b0, b1, etc.) to reduce this loss —
.
like a student adjusting study habits to get better marks!
r
D
Gradient Descent?
Imagine you're blindfolded and trying to walk down a hill to reach the lowest point (the bottom of
the hill = best model).
At first, you guess some weights (b0, b1, etc.).
You calculate the error (how wrong your model is).
Then, you use gradient descent to take small steps downhill, adjusting the weights to reduce the
error.
Gradient Descent = "Keep taking steps in the direction that reduces the error."
Each step makes the model slightly better, until you reach a point where it can’t get any better —
that’s when the model is "trained."
17
How Gradient Descent Works in Logistic Regression
Start with random weights
Use the weights to make predictions (using sigmoid function)
Calculate how wrong the predictions are (using a loss function)
Adjust the weights to make the predictions better
Repeat until the model becomes accurate
This loop happens many times — like practicing over and over until you get the answer right.
🌳 2. Decision Trees
When to Use:
When you want rule-based reasoning
When interpretability is important g e
Example: l e
Loan Approval – A bank checks income, age, credit score. o l
[Credit Score > 700?]
C
/ \ n g
Yes No r i
[Income > 50k?] Reject e e
/ \ i n
Yes No g
Approve Reject En
E A
👨👩👧 3. K-Nearest Neighbors (KNN) K M
When to Use: F ,
,
When "birds of a feather" logic
S
applies (similar things group together)
V
When data has natural clusters
Example:
i ni
Classifying handwritten
a ldigits (0-9) using pixel similarity.
S h
🌲 4. Random Forest
r .
WhenDto Use:
When you need high accuracy
When you have lots of features and data
Example:
Predicting loan defaults using many customer details
18
📧 6. Naive Bayes
When to Use:
For text classification (e.g., emails, reviews)
When independent features are assumed
Example:
Email spam detection, sentiment analysis
Based on Bayes’ Theorem and assumes each feature contributes independently to the
outcome.
i ni
Group 2: Middle-aged
a l & Budget Conscious
ShCitizens
Group 3: Senior
r .
D
This is clustering. K-Means helps you find such groups.
Step 2: Randomly place 3 points — these are your initial centroids (cluster centers).
Step 3: Assign each data point to the nearest centroid
Each point now belongs to one of the 3 clusters.
19
Step 4: Recalculate the centroids
For each cluster, calculate the mean position of all points in it — this becomes the new center.
g e
l e
o l
C
n g
r i
e e
i n
g
En
E A
Introduction to Artificial,KNeural M Networks (ANNs)
An Artificial Neural Network (ANN) isSaF computational model inspired by the structure and
functioning of the biological nervous ,
V system, particularly the human brain. Just as the human brain is
composed of billions of neurons that communicate with each other to process information, ANNs
ni known as nodes or units) that are interconnected in a network and
consist of artificial neuronsi(also
work collectively to solve
a ltasks.
Sh
r .
D
20
Neural networks are capable of learning and identifying patterns directly from data without pre-defined
rules. These networks are built from several key components:
Neurons: The basic units that receive inputs, each neuron is governed by a threshold and an activation
function.
Connections: Links between neurons that carry information, regulated by weights and biases.
Weights and Biases: These parameters determine the strength and influence of connections. During
training, the network adjusts these weights to minimize the prediction error. Biases are additional
parameters added to the neuron to shift the activation function, helping the model to better fit the data.
Propagation Functions: Mechanisms that help process and transfer data across layers of neurons.
Learning Rule: The method that adjusts weights and biases over time to improve accuracy.
ANNs are capable of learning from data. They are particularly useful for modeling complex
relationships between inputs and outputs and for discovering hidden patterns in data. Applications of
ANNs include image recognition, speech processing, medical diagnosis, and financial forecasting,
among others. g e
l e
Basic Structure of an ANN o l
A typical artificial neural network consists of three main types of layers:
C
n g
r i
a) Input Layer e
This is the first layer of the network. n e
g i
En
It receives the raw input data and passes it to the subsequent layers for processing.
E A
Each neuron in this layer corresponds toMone feature of the input dataset (e.g., in a student
,
performance prediction model: attendance
K rate, hours of study, and internal assessment
marks). S F
V,
ni
b) Hidden Layer(s)
i
These are the intermediate
l layers between the input and output layers.
a
Shthe core computations by processing the inputs through weighted
They perform
connections
r .and activation functions.
D
The number of hidden layers and the number of neurons within each layer determine
the complexity and learning capacity of the network.
c) Output Layer
This layer provides the final output or prediction of the network.
The number of neurons in this layer corresponds to the nature of the task:
21
For multi-class classification, there are multiple output neurons.
g e
l e
o l
C
n g
ri
e e
i n
Illustrative Example: Predicting Student Performance g
En
Let us consider a basic example where the objective is to predict whether a student will pass or fail
based on certain academic inputs.
E A
➤ Inputs: K M
Attendance Percentage F ,
, S
Hours of Study per Week
V
i ni
Internal AssessmentlMarks
h a
S are fed into the input layer of the neural network.
These three values
r .
D
➤ Processing:
Each input is multiplied by a weight.
The result is passed through an activation function (e.g., sigmoid or ReLU), which introduces
non-linearity and helps the network learn complex patterns.
➤ Output:
If the output value is close to 1, the student is predicted to pass.
22
Learning Patterns:
Over time, as the ANN is trained with more examples (student data with known pass/fail
outcomes), it learns patterns such as:
Higher hours of study and better internal marks increase the probability of passing.
Concept Description
Artificial g e
Neuron
Basic computational unit that mimics a biological neuron
l e
Layers Organized structure: Input, Hidden, and Output o l
Weights &
C
Biases n g
Parameters that guide the learning process
Activation r i
Function that adds non-linearity and helps the network learn complex
Func. data e e
i n
Adjusting weights based on data using optimization algorithms like
g
Learning
En
gradient descent
E A
K M
Perceptron F ,
, S
Perceptron V
i
A Perceptron is a type of artificial neuron, introduced by Frank Rosenblatt in 1958. A perceptron takes
n
li
several inputs, applies weights to them, adds a bias, and then passes the result through an activation
function to produce an output. A Perceptron is the most fundamental unit of an artificial neural
a
h
network, which is widely used in machine learning and artificial intelligence. It is inspired by the
S
functioning of a biological neuron and is used to perform binary classification—that is, to decide
.
r
whether something belongs to one class or another (e.g., yes/no, true/false, safe/unsafe).
D
Structure of a Perceptron
A perceptron includes:
Inputs: x₁, x₂, ..., xₙ
Weights: w₁, w₂, ..., wₙ
Bias: b
Weighted Sum: z = w₁·x₁ + w₂·x₂ + ... + wₙ·xₙ + b
Activation Function: Usually a step function
Perceptron Equation
Weighted Sum:
23
𝑧 = 𝑤₁ · 𝑥₁ + 𝑤₂ · 𝑥₂ + . . . + 𝑤ₙ · 𝑥ₙ + 𝑏
Output (Activation Function):
If z > 0, then Output = 1
Else, Output = 0
Real-Life Examples
1. University Admission
Inputs:
x₁ = Math marks
x₂ = English marks
If:
w₁·x₁ + w₂·x₂ + b > 0 → Admit (1)
Otherwise → Reject (0) e
2. Fire Safety Alert e g
Inputs from sensors: ll
x₁ = Temperature C o
x₂ = Smoke level
n g
x₃ = Gas detection
ri
Decision:
e e
If w₁·x₁ + w₂·x₂ + w₃·x₃ + b > 0 → Unsafe (1)
i n
Else → Safe (0) g
En
Perceptron Learning Rule
E A
M
When a prediction is wrong, we update the weights and bias:
K
Weight Update Rule: F ,
S
wᵢ = wᵢ + α × (y_true − y_pred) × xᵢ
,
Bias Update: V
i
b = b + α × (y_true − y_pred)
n
Where:
a li
α = learning rate
h
S
y_true = actual label
r .
y_pred = predicted label
D
❗ Limitation
A single-layer perceptron can only classify linearly separable data (like AND, OR), but not
problems like XOR, which are non-linearly separable. This was addressed later using multi-
layer perceptrons (MLPs).
Summary Table
Feature Description
Purpose Binary classification
Learning Weight adjustment based on error
Inspired by Biological neurons
24
Applications Safety systems, decision support, etc.
Limitation Cannot solve non-linear problems like XOR
1. Forward Propagation
(i) Linear Transformation g e
e
Each neuron receives inputs, multiplies them with weights, adds a bias, and computes a value
l
denoted as z: o l
Equation: C
𝑧 = 𝑤₁ · 𝑥₁ + 𝑤₂ · 𝑥₂ + … + 𝑤ₙ · 𝑥ₙ + 𝑏 n g
Where: r i
x₁, x₂, ..., xₙ are input features e e
w₁, w₂, ..., wₙ are weights i n
g
b is the bias
En
z is the result passed to the activation function
(ii) Activation Function E A
KM
To introduce non-linearity, z is passed through an activation function.
F ,
Common activation functions: S
Sigmoid: 𝜎(𝑧) = 1 / (1 + 𝑒⁻ᶻ) V ,
Tanh: 𝑡𝑎𝑛ℎ(𝑧) = (𝑒ᶻ − 𝑒⁻ᶻ) / (𝑒ᶻ + 𝑒⁻ᶻ)
i
ReLU: 𝑓(𝑧) = 𝑚𝑎𝑥(0, 𝑧) n
a li
2. Backpropagation
S h
(i) Loss Calculation
.
r
The network calculates a loss to measure prediction error.
D
Examples:
Mean Squared Error (MSE) (for regression):
𝑀𝑆𝐸 = (1/𝑛) · 𝛴(𝑦ᵢ − ŷᵢ)²
Cross-Entropy Loss (for classification):
Loss = −Σ[y · log(ŷ)]
Where:
yᵢ = actual output
ŷᵢ = predicted output
(ii) Gradient Calculation
The gradients of the loss with respect to weights and biases are computed using the chain rule
of calculus.
25
(iii) Parameter Update
Weights and biases are updated using an optimization algorithm such as Stochastic Gradient
Descent (SGD):
Update rules:
𝑤 = 𝑤 − 𝛼 · 𝜕𝐿𝑜𝑠𝑠/𝜕𝑤
𝑏 = 𝑏 − 𝛼 · 𝜕𝐿𝑜𝑠𝑠/𝜕𝑏
Where:
α = learning rate
∂Loss/∂w, ∂Loss/∂b = partial derivatives of the loss
l i
sigmoid or tanh) can approximate any continuous function defined on a bounded input domain
(technically, a compact
a subset of ℝⁿ), to any desired degree of accuracy, provided it has enough
neurons.” S h
What It Means
. – Intuition
r
Let’s interpret
D the components of the theorem in simpler terms:
Term Meaning
Feedforward network A neural network where information moves only forward — from
input to output (no loops).
Single hidden layer Just one intermediate layer between input and output layers.
Continuous function A function with no jumps or breaks (like a smooth curve).
Approximate The network output can get very close (within any tiny error) to
the actual function value.
Compact subset of ℝⁿ A finite region in n-dimensional space, like a cube or a closed
interval.
Non-linear activation A function that adds complexity — without it, the network would
function behave like a straight line.
26
🧠 Why It’s Important
It justifies the power of neural networks: Even with only one hidden layer, a network is theoretically
capable of learning any pattern or relationship in data — like predicting disease risk, modeling chemical
reactions, or identifying fire hazards.
This means neural networks are universal approximators — they can model anything (as long as it's
continuous and bounded).
🔄 Example Analogy
Think of a neural network as a toolbox of small building blocks (neurons).
Imagine you're trying to trace a curve (a function) using Lego blocks.
e
With enough small blocks (neurons) and the right arrangement (weights and biases), you can build
g
e
a very close copy of the curve. The more complex the curve, the more blocks you need — but you
l
don’t need a second layer of blocks to do it. l
C o
Important Notes g
n
The theorem doesn’t say the network will learn the function easily — just that it can, if given the
i
right weights. e r
n e
i
It doesn’t guarantee training success — learning the weights may be hard in practice.
g
En
More layers (deep learning) are often used in practice because they learn complex functions more
efficiently and with fewer neurons per layer. E A
K M
Summary F ,
, S
Concept
V What It Tells Us
UAT ni
Neural networks with one hidden layer can approximate any
i
a lcontinuous function
Practical Sh Neural networks are flexible and powerful
implication .
Limitation
D r The theorem is about potential — not training efficiency
27
Architecture of an MLP
1. Input Layer
Receives raw data features (x₁, x₂, ..., xₙ).
Does not perform any computation.
2. Hidden Layers
One or more intermediate layers where computation occurs.
Each neuron performs a weighted sum of inputs and applies a non-linear activation function.
Computation in each neuron:
𝑧 = 𝑤₁ · 𝑥₁ + 𝑤₂ · 𝑥₂ + . . . + 𝑤ₙ · 𝑥ₙ + 𝑏
𝑎 = 𝑓(𝑧)
Where:
wᵢ = weights
xᵢ = input values e
b = bias e g
f(z) = activation function (e.g., ReLU, sigmoid, tanh) ll
a = activation/output of the neuron C o
3. Output Layer
Produces the final prediction. n g
r i
For classification tasks, it often uses softmax or sigmoid activation.
e e
Working Mechanism i n
g
Forward Propagation
En
Input data flows from the input layer to the output layer through hidden layers.
Each layer applies: E A
Linear transformation: 𝑧 = 𝑤 · 𝑥 + 𝑏 K M
Activation function: 𝑎 = 𝑓(𝑧) F ,
Backpropagation , S
V
The error between predicted and actual output is calculated using a loss function.
n i
Gradients of the loss with respect to weights and biases are computed.
li
Parameters are updated using optimization algorithms (e.g., SGD, Adam).
a
S h
Activation Functions
Function r . Formula Purpose
ReLU
D f(z) = max(0, z) Fast and widely used
Sigmoid σ(z) = 1 / (1 + e⁻ᶻ) Squashes values between 0 and 1
Tanh tanh(z) = (eᶻ − e⁻ᶻ) / (eᶻ + e⁻ᶻ) Output between −1 and 1
28
Deep architecture One or more hidden layers
Non-linearity Enabled by activation functions
Universal Approximation Can model any continuous function
Trainable parameters Weights and biases learned from data
Supervised learning Requires labeled data for training
❗ Limitations
Requires large datasets and computational power
Prone to overfitting without regularization
May suffer from vanishing gradient problem in deep architectures
29
Deep networks can:
Learn low-level features in early layers (e.g., edges in images)
Learn high-level abstractions in deeper layers (e.g., shapes, objects)
This makes them extremely powerful for:
• Image classification
• Speech recognition
• Language translation
• Fire hazard prediction
• Industrial process control
Summary Table a li
FeatureS
h Deep Neural Network
r .
Depth Multiple hidden layers
D
Learning Via forward and backward propagation
Advantage Learns complex hierarchical patterns
Limitation Computationally intensive, harder to interpret
30